# Optimization produces incorrect results #603

Closed
opened this Issue Jan 5, 2018 · 36 comments

Projects
None yet
2 participants
Member

### rahulmutt commented Jan 5, 2018

 @jarekratajski reported this: ```fibbtcoinner 0 sum presum = sum fibbtcoinner n sum presum = fibbtcoinner (n-1) (sum + presum) sum fibbtco n = fibbtcoinner n 1 0 fibbnaive 0 = 1 fibbnaive 1 = 1 fibbnaive n = fibbnaive ( n-1) + fibbnaive ( n - 2) main = do putStrLn \$ show \$ fibbtco <\$> arr putStrLn \$ show \$ fibbnaive <\$> arr where arr = [0..10]``` produces ``````[1,1,2,4,8,16,32,64,128,256,512] [1,1,2,3,5,8,13,21,34,55,89] `````` when it should produce: ``````[1,1,2,3,5,8,13,21,34,55,89] [1,1,2,3,5,8,13,21,34,55,89] `````` Reproduced on v0.7.

Member

### rahulmutt commented Jan 5, 2018

 This may be related to #478.
Contributor

### jarekratajski commented Jan 5, 2018 • edited

 To be more precise: with -O0 problem prevails. Correct behavior only with -O0 AND BangPatterns params. I am just looking at code of eta compiler... do you have any hint where to look at first?
Contributor

### jarekratajski commented Jan 5, 2018 • edited

 While analyzing decompiled code : `````` public final Closure apply6(StgContext var1, Closure var2, Closure var3, Closure var4, Closure var5, Closure var6, Closure var7) { Object var9 = null; //i guess n Object var10 = null; //i guess sum Object var11 = null; //i guess presum boolean var8 = false; while(true) { while(var8) { Main.sat_s7YH var12 = new Main.sat_s7YH(var3); var1.R1 = var2; Closure var13 = Classes.zeze().enter(var1).apply2(var1, (Closure)var9, var12); if (!(var13 instanceof False)) { return ((Closure)var10).evaluate(var1); } Main.sat_s7YM var14 = new Main.sat_s7YM(var4, (Closure)var10, (Closure)var11); //this must be sum + presum Main.sat_s7YL var15 = new Main.sat_s7YL(var3, (Closure)var9); // this must be n-1 var9 = var15; var10 = var14; // here is the error IMO var11 = var14; // /* this would be correct var11 = var10; var10 = var14; */ var8 = true; } var9 = var5; var10 = var6; var11 = var7; var8 = true; } } `````` I am only guessing... but this brought me to idea of swapping order of params : `````` fibbtcoinner 0 presum sum = sum fibbtcoinner n presum sum = fibbtcoinner (n-1) sum (sum + presum) fibbtco n = fibbtcoinner n 0 1 `````` and now this code gives correct results also in O2. Hope this helps.
Member

### rahulmutt commented Jan 5, 2018

 Thanks for taking a look! So swapping the order of params fixes it. We still need to investigate why the original problem happened. The way to debug stuff like this is: Check that the optimized intermediate code (in this case, STG) does what's expected. To check this, you can: Add `-ddump-stg -ddump-to-file` flags when compiling directly with `eta` Add `eta-options: -ddump-stg -ddump-to-file` in the `executable` stanza of your Cabal file if compiling with `etlas` STG code is fairly simple to understand if you ignore all the unnecessary info, think of it as a very simple functional language. If you have trouble following it, share in a Gist and I can help you out with the parts that are tricky. Now if everything looks fine at the STG level, the next thing to look at is what you did above - the generated Java code. We have to make sure that the translation is happening correctly - most bugs come out of a mismatch between STG & the generated code, and most of the time the issue is with the generated code.
Contributor

### jarekratajski commented Jan 5, 2018 • edited

 Disclaimer: I am total newbie to Haskell and GHC and even compilers. Sorry. But looking at STG I think it is proper ``````Fibb.fibbtcoinner :: forall a_a7U9 a1_a7UA. (GHC.Classes.Eq a_a7U9, GHC.Num.Num a_a7U9, GHC.Num.Num a1_a7UA) => a_a7U9 -> a1_a7UA -> a1_a7UA -> a1_a7UA [GblId, Arity=6, Caf=NoCafRefs, Str=, Unf=OtherCon []] = \r srt:SRT:[] [\$dEq_s8MJ \$dNum_s8MK \$dNum1_s8ML eta_s8MM eta1_s8MN eta2_s8MO] let { lvl1_s8MP [Occ=OnceL] :: a_a7UF [LclId, Str=] = \u srt:SRT:[] [] GHC.Num.fromInteger \$dNum_s8MK Fibb.fibbnaive3; } in let { lvl2_s8MQ [Occ=OnceL] :: a_a7UF [LclId, Str=] = \u srt:SRT:[] [] GHC.Num.fromInteger \$dNum_s8MK Fibb.fibbnaive1; } in let-no-escape { fibbtcoinner1_s8MR [Occ=LoopBreaker] :: a_a7UF -> a1_a7UG -> a1_a7UG -> a1_a7UG [LclId, Arity=3, Str=, Unf=OtherCon []] = sat-only \r srt:SRT:[] [ds_s8MS sum_s8MT presum_s8MU] case GHC.Classes.== \$dEq_s8MJ ds_s8MS lvl2_s8MQ of _ [Occ=Dead] { GHC.Types.False -> let { sat_s8MX [Occ=Once] :: a1_a7UG [LclId, Str=] = \u srt:SRT:[] [] GHC.Num.+ \$dNum1_s8ML sum_s8MT presum_s8MU; } in let { sat_s8MW [Occ=Once] :: a_a7UF [LclId, Str=] = \u srt:SRT:[] [] GHC.Num.- \$dNum_s8MK ds_s8MS lvl1_s8MP; } in fibbtcoinner1_s8MR sat_s8MW sat_s8MX sum_s8MT; GHC.Types.True -> sum_s8MT; }; } in fibbtcoinner1_s8MR eta_s8MM eta1_s8MN eta2_s8MO; `````` Because the call (loop?) has in fact 2 distinct variables: ` } in fibbtcoinner1_s8MR sat_s8MW sat_s8MX sum_s8MT;` Seems that Java produced somehow messes sat_s8MX with sum_s8MT. But at least here at STG level they seem to be different.
Member

### rahulmutt commented Jan 5, 2018

 Here's the output with `-O2` which is less code to look at and makes the problem a lot more clear (it's just as you pointed out): ``` public static Closure main_fibbtcoinner(StgContext paramStgContext, Closure paramClosure1, Closure paramClosure2, Closure paramClosure3) { for (;;) { Type.eqIntegerzh(paramStgContext, paramClosure1, main6()); int i = paramStgContext.I1; if (i == 1) { break; } Closure localClosure1 = Type.plusInteger(paramStgContext, paramClosure2, paramClosure3); Closure localClosure2 = Type.minusInteger(paramStgContext, paramClosure1, main5()); paramClosure1 = localClosure2; paramClosure2 = localClosure1; paramClosure3 = paramClosure2; } return paramClosure2.evaluate(paramStgContext); }```
Member

### rahulmutt commented Jan 5, 2018

 This problem is extremely subtle which is why we didn't encounter this yet. It is more likely to happen with very small and simple recursive functions which don't transform all their arguments when making a recursive call. The problem is that last line `paramClosure3 = paramClosure2` which corresponds to passing the `sum` parameter in the recursive call in a different position. The issue is that this needs to be re-ordered. A topological sort must be done to ensure that we don't set the bindings in incorrect order. It should be done like this: ``` paramClosure3 = paramClosure2; paramClosure1 = localClosure2; paramClosure2 = localClosure1;``` or even this: ``` paramClosure1 = localClosure2; paramClosure3 = paramClosure2; paramClosure2 = localClosure1;``` You made nice observations for not looking at this stuff before :)
Contributor

### jarekratajski commented Jan 5, 2018

 Does it have to do something with? ``````-- TODO: Verify that this is valid in all cases, -- otherwise fall back on the strongly connected components -- algorithm a la GHC multiAssign :: [CgLoc] -> [Code] -> Code multiAssign locs codes = fold \$ zipWith storeLoc locs codes `````` in Layout.sh
Member

### rahulmutt commented Jan 5, 2018 • edited

 Yes exactly that :) It turns out that #478 has the same problem so that will also be fixed if we fix this. The tricky part here is that we can't really inspect inside a `Code` data type since it just has a monadic action that generates it when supplied with a constant pool and some other parameters. This lack of introspection actually cripples our ability to do bytecode optimization, so we really need to create an API to work with the direct bytecode AST instead. We'll need to expose a simple analysis in `codec-jvm` that runs the `Instr` monad and inspects the resulting `ByteString` for the `*load` instructions and returns a `[Int]` - all the slots that were loaded from in the code. Would you be interested in giving this a try?
Member

### rahulmutt commented Jan 5, 2018

 For reference, this is GHC's code to handle the problem: https://github.com/ghc/ghc/blob/master/compiler/codeGen/StgCmmUtils.hs#L375-L444
Contributor

### jarekratajski commented Jan 5, 2018 • edited

 I understand the problem and more or less get the code. I will try - but as for now I am too weak to really solve it. So please do not count on me too much.
Member

### rahulmutt commented Jan 5, 2018

 Sure no worries. Give it a try and share any attempts here.
Member

### rahulmutt commented Jan 6, 2018

 @jarekratajski I was taking a look at `multiAssign` usages and it turns out that both code locations allow you to access `CgLoc` (Code-generation locations) where they are currently sending in `Code`. `CgLoc`'s you can at least pattern match on so it would be easier to construct the topological sort.

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 7, 2018

``` typelead#603 fixing variables assign in TCO with topological sort (to… ```
`…Master)`
``` 9c5bb0f ```
Contributor

### jarekratajski commented Jan 7, 2018 • edited

 I did small trial where I fix only one call of multiAssign and it seems to work. There are some dirty places there. The fix also does not solve the problem of potential Cycles. I've created another sample with cycle and it is still broken. I will continue working on this.
Member

### rahulmutt commented Jan 7, 2018

 Ok great. I gave some feedback on your work so far - great start! Please put your test cases in `tests/basic/multiAssign/`. We don't run those tests yet, but we will eventually and it'll be great to keep track of them.
Member

### rahulmutt commented Jan 9, 2018

 @jarekratajski Do you have a timeline for the fix for this? Ideally, I'd like to see a fix by the end of the week since this is a pretty important bug. If your current schedule does not allow for that, please create a PR with the work you've done so far and I'll finish up the remaining bits.
Contributor

### jarekratajski commented Jan 9, 2018

 @rahulmutt I will have some time in next 48 hours for that - so hopefully I can do it or at least push something better.
Member

### rahulmutt commented Jan 9, 2018

 @jarekratajski Great, thanks for the update.

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 no longer working solution ```
``` 3ac051d ```
Contributor

### jarekratajski commented Jan 10, 2018 • edited

 @rahulmutt I need some help. I've tried to port 1 by 1 iines from GHC with Digraph and it does not build packages anymore. And I do not know how to "debug" it efficiently. I suspect I've mixed somehow "stores" with "loads" in Layout.hs unscramble or I did not get how CodeGen monad works Line 161 - 162 in Expr.hs ... are not codes generated by emitMultiAssign lost? If I only knew how to print debugs during build process sensibly maybe I would find out.
Member

### rahulmutt commented Jan 10, 2018

 @jarekratajski Can you show me a diff of your changes? I can take a look at where you are and see if I can help you forward. Are you getting compile-time errors or run-time errors? I can help out with debugging methods if you're specific on what kind of info you need.
Contributor

### jarekratajski commented Jan 10, 2018 • edited

 moment... maybe I did very stupid thing and compiler even says me what code is as ion jarekratajski@3ac051d basically I was preparing PR there
Member

### rahulmutt commented Jan 10, 2018

 I noticed that you changed the version number from `0.7.0` to `0.0.9` - that may be causing problems. The build relies on the fact that all the version numbers are consistent. What is happening when you run `./cleaninstall.sh`?

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 warnings cleaned ```
``` a4f5f05 ```
Contributor

### jarekratajski commented Jan 10, 2018

 What I need is to compile eta - without recompiling base packages. I've "screweed" something but I need to test it on a single small file. Otherwise I wil not find out. I did a stupid pull request to easily track changes.
Contributor

### jarekratajski commented Jan 10, 2018

 Version number was changed before - that is now fixed.
Contributor

### jarekratajski commented Jan 10, 2018

 cleaninstall.sh results in `````` uilding library for ghc-prim-0.4.0.0.. [1 of 5] Compiling GHC.Types ( GHC/Types.hs, dist/build/GHC/Types.jar ) [2 of 5] Compiling GHC.Tuple ( GHC/Tuple.hs, dist/build/GHC/Tuple.jar ) [3 of 5] Compiling GHC.CString ( GHC/CString.hs, dist/build/GHC/CString.jar ) eta: panic! (the 'impossible' happened) (Eta version 0.7.0b1): Prelude.head: empty list `````` Must be that I somehow do not emit correct assigns now.

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 changed from to stmt ```
``` f25ef18 ```
Contributor

### jarekratajski commented Jan 10, 2018 • edited

 Or after last change in Layout.hs I get now. [4 of 5] Compiling GHC.Magic ( GHC/Magic.hs, dist/build/GHC/Magic.jar ) [5 of 5] Compiling GHC.Classes ( GHC/Classes.hs, dist/build/GHC/Classes.jar ) eta: panic! (the 'impossible' happened) (Eta version 0.7.0b1): getCgIdInfo: no module eta_B1 which probably happens here: getGetDepCgLoad (NonVoid (StgVarArg var)) = Right <\$> cgLocation <\$> getCgIdInfo var
Member

### rahulmutt commented Jan 10, 2018 • edited

 @jarekratajski A nice way to debug codegen stuff is to add `-ddump-cg-trace` into the project you want to debug. In this case it's `ghc-prim` - so go to `libraries/ghc-prim/ghc-prim.cabal` and add `-ddump-cg-trace -ddump-to-file` to the `ghc-options:` field. In the trace, you can see which precise part of the codegen choked on your changes and where `eta_B1` comes from.
Member

### rahulmutt commented Jan 10, 2018 • edited

 That issue typically happens when you try to load an `Id` which hasn't been inserted into the bindings yet in the `CodeGen` monad. Seems like an ordering problem.
Contributor

### jarekratajski commented Jan 10, 2018

 Thanks. I've changed it ghc-options: -ddump-cg-trace -ddump-to-file -this-unit-id ghc-prim but still do not see any traceCG outputs. even in the created file ``````jarek@ubuntu:~/.etlas/logs/eta-0.7.0.1\$ cat ghc-prim-0.4.0.0-Jhi6UgHuZdoBZWUpVo3WKE.log etlas: Entering directory '.' Configuring ghc-prim-0.4.0.0... Preprocessing library for ghc-prim-0.4.0.0.. Building library for ghc-prim-0.4.0.0.. [1 of 5] Compiling GHC.Types ( GHC/Types.hs, dist/build/GHC/Types.jar ) [2 of 5] Compiling GHC.Tuple ( GHC/Tuple.hs, dist/build/GHC/Tuple.jar ) [3 of 5] Compiling GHC.CString ( GHC/CString.hs, dist/build/GHC/CString.jar ) [4 of 5] Compiling GHC.Magic ( GHC/Magic.hs, dist/build/GHC/Magic.jar ) [5 of 5] Compiling GHC.Classes ( GHC/Classes.hs, dist/build/GHC/Classes.jar ) eta: panic! (the 'impossible' happened) (Eta version 0.7.0b1): getCgIdInfo: no module eta_B1 Please report this as a Eta bug: http://github.com/typelead/eta/issues etlas: Leaving directory '.' ``````
Member

### rahulmutt commented Jan 10, 2018

 I changed those exact same options and I was able to find it here: `libraries/ghc-prim/dist/build/GHC/CString.dump-cg-trace` (for example)
Contributor

### jarekratajski commented Jan 10, 2018

 Thanks. That was it. Sorry, had no idea which file to look at. I see as last lines of Classes.dump-cg-trace ``````StgOpApp tagToEnum# [sat_sZF7] Bool generating \$wccall creating new closure...\$wccall StgOpApp __pkg_ccall_value ghc-prim False|False|2,True,False,False,"java/lang/Comparable","compareTo","(Ljava/lang/Object;)I" [sZFA] [ds_sZF8, ds1_sZF9, eta_B1] (# State# RealWorld, Int# #) StgApp sat_sZFB [eta_B1] cgIdApp: JumpToIt `````` so it must fail on `deps <- dependencies args` and then probably on this line ``````getGetDepCgLoad (NonVoid (StgVarArg var)) = Right <\$> cgLocation <\$> getCgIdInfo var `````` where getCgIdInfo fails
Contributor

### jarekratajski commented Jan 10, 2018

 After addeing some debug ``````JumpToIt label cgLocs mLne -> do traceCg (str "cgIdApp: JumpToIt") -- codes <- getNonVoidArgCodes args printBindings traceCg (ppr args) deps <- dependencies args traceCg (str "JAREK: deps calculated") emitMultiAssign cgLocs deps emit \$ maybe mempty (\(target, targetLoc) -> storeLoc targetLoc (iconst (locFt targetLoc) \$ fromIntegral target)) mLne <> goto label `````` I get the file as in attachment: Classes.dump-cg-trace.txt I do not know why getCgIdInfo var does not work even though I've copied more or less code from the previous getNonVoidArgCodes function. (Very likely it is something obvious and stupid that I do not see. My experience with haskell is mostly trying to solve this bug + doing exercises with quicksort and fibonacci ... not that great).
Contributor

### jarekratajski commented Jan 10, 2018

 And I see. I have not fully copied logic from getNonVoidArgCodes .. cause I forgot about nonvoid check.

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 still dirty but WORKING solution ```
``` 2582101 ```

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 fixed all calls to multiAssign - removed debugs ```
``` 35c12c4 ```

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 added test cases ```
``` 69aaf01 ```
Contributor

### jarekratajski commented Jan 10, 2018 • edited

 PR is is ready for review. I've added test cases that work. Cycles also seem to work. One strange thing is I had to actually reverse results of "topological sort" from Digraph see Line 83 from Layout.hs - but with this reverse all makes sense. For reference commit #2582101 contains lot of debugs that can be used (show how graph looks like). @rahulmutt Thank You for all the help! That was really nice way for me to learn Haskell (and do sth else than boring JavaEE/Tomcat stuff :-) I do at work ). I will have some time to fix polish this code if you have further hints. (Sorry for style and naming - probably I am doing Javaskell a little).

### jarekratajski added a commit to jarekratajski/eta that referenced this issue Jan 10, 2018

``` typelead#603 removed idea file ```
``` 7b2d2c1 ```
Member

### rahulmutt commented Jan 11, 2018

 Don't worry about styling/naming - I'll do it later in a cleanup commit. You got the core fix working which is enough :) I'll merge it now.

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 fixing variables assign in TCO with topological sort (toMaster) ```
``` 978ee49 ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 no longer working solution ```
``` bf56856 ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 warnings cleaned ```
``` 2063c86 ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 changed from to stmt ```
``` a867b0d ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 still dirty but WORKING solution ```
``` 053f2ab ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 fixed all calls to multiAssign - removed debugs ```
``` 45c0e2a ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 added test cases ```
``` b5ca531 ```

### rahulmutt added a commit that referenced this issue Jan 11, 2018

``` #603 removed idea file ```
``` f7d4885 ```
Member

### rahulmutt commented Jan 11, 2018

 Verified the `master`'s fix is working. Closing. Thanks again @jarekratajski.

Merged

Closed