Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test "Pipes: HMM pipe model is equivalent to standard model" is flaky #283

Closed
turion opened this issue Jun 12, 2023 · 7 comments
Closed

Comments

@turion
Copy link
Collaborator

turion commented Jun 12, 2023

Failures like these occur often:

Failures:

  test/Spec.hs:141:5: 
  1) Pipes: HMM pipe model is equivalent to standard model
       Falsified (after 32 tests and 4 shrinks):
         [0.0,0.0,-1.6666666666666667,-1.6666666666666667]

  To rerun use: --match "/Pipes: HMM/pipe model is equivalent to standard model/"

Randomized with seed 309526202

This might even lead to monad-bayes getting marked broken in nixpkgs, see NixOS/nixpkgs#236431

@turion
Copy link
Collaborator Author

turion commented Jun 12, 2023

Also see #260 , we should move pipes to a separate package. Then only that separate package would fail.

@turion
Copy link
Collaborator Author

turion commented Jun 12, 2023

Might be caused by nick8325/quickcheck#359

@turion
Copy link
Collaborator Author

turion commented Jun 12, 2023

Or similar to haskell/vector#460 it might be that the new generator produces more values for which we experience overflows.

@turion
Copy link
Collaborator Author

turion commented Jun 12, 2023

Easily reproduceable wtih this diff:

diff --git a/monad-bayes.cabal b/monad-bayes.cabal
index e74b1fe..94b3fed 100644
--- a/monad-bayes.cabal
+++ b/monad-bayes.cabal
@@ -189,7 +189,7 @@ test-suite monad-bayes-test
     , pipes
     , pretty-simple
     , profunctors
-    , QuickCheck
+    , QuickCheck == 2.14.3
     , random
     , statistics
     , text

And a file cabal.project.local containing:

tests: True

@turion
Copy link
Collaborator Author

turion commented Jun 12, 2023

The issue is in the hard sorting function in the compact function. I added the following diff to inspect the generated values:

diff --git a/test/TestPipes.hs b/test/TestPipes.hs
index 4efe68a..fe51a12 100644
--- a/test/TestPipes.hs
+++ b/test/TestPipes.hs
@@ -8,12 +8,14 @@ import Control.Monad.Bayes.Enumerator (enumerator)
 import Data.AEq (AEq ((~==)))
 import HMM (hmm, hmmPosterior)
 import Pipes.Prelude (toListM)
+import Test.QuickCheck (counterexample, Property, property)
 
 urns :: Int -> Bool
 urns n = enumerator (urn n) ~== enumerator (urnP n)
 
-hmms :: [Double] -> Bool
+hmms :: [Double] -> Property
 hmms observations =
   let hmmWithoutPipe = hmm observations
       hmmWithPipe = reverse . init <$> toListM (hmmPosterior observations)
-   in enumerator hmmWithPipe ~== enumerator hmmWithoutPipe
+      msg = unlines $ show <$> [("with   ", enumerator hmmWithPipe), ("without", enumerator hmmWithoutPipe)]
+  in counterexample msg $ property $ (fst <$> enumerator hmmWithPipe) ~== (fst <$> enumerator hmmWithoutPipe)

This gave e.g.

  1) Pipes: HMM pipe model is equivalent to standard model
       Falsified (after 9 tests and 2 shrinks):
         [0.0,0.0,-0.5]
         ("with   ",[([1,2,0],0.1546027526337784),([2,0,1],7.073671679025113e-2),([1,1,2],6.82502090909774e-2),([1,2,1],6.184110105351136e-2),([1,1,1],5.85001792208378e-2),([2,2,0],5.305253759268838e-2),([2,0,2],5.305253759268837e-2),([1,2,2],4.638082579013355e-2),([1,1,0],3.900011948055852e-2),([0,2,0],3.8836185197532215e-2),([2,1,2],3.643161738034025e-2),([2,0,0],3.53683583951256e-2),([1,0,1],3.533777203057793e-2),([2,1,1],3.122710061172023e-2),([0,1,2],2.6669130334371047e-2),([1,0,2],2.6503329022933453e-2),([0,1,1],2.2859254572318057e-2),([2,2,1],2.1221015037075363e-2),([2,1,0],2.0818067074480157e-2),([1,0,0],1.7668886015288984e-2),([2,2,2],1.591576127780652e-2),([0,2,1],1.5534474079012886e-2),([0,1,0],1.5239503048212037e-2),([0,2,2],1.1650855559259665e-2),([0,0,1],1.0356316052675273e-2),([0,0,2],7.767237039506448e-3),([0,0,0],5.1781580263376365e-3)])
         ("without",[([1,2,0],0.15460275263377876),([2,0,1],7.073671679025137e-2),([1,1,2],6.82502090909775e-2),([1,2,1],6.184110105351154e-2),([1,1,1],5.850017922083786e-2),([2,0,2],5.3052537592688506e-2),([2,2,0],5.3052537592688506e-2),([1,2,2],4.6380825790133653e-2),([1,1,0],3.900011948055858e-2),([0,2,0],3.883618519753228e-2),([2,1,2],3.643161738034034e-2),([2,0,0],3.536835839512568e-2),([1,0,1],3.533777203057802e-2),([2,1,1],3.122710061172028e-2),([0,1,2],2.666913033437113e-2),([1,0,2],2.650332902293353e-2),([0,1,1],2.2859254572318098e-2),([2,2,1],2.122101503707541e-2),([2,1,0],2.081806707448021e-2),([1,0,0],1.7668886015289015e-2),([2,2,2],1.5915761277806563e-2),([0,2,1],1.5534474079012931e-2),([0,1,0],1.5239503048212075e-2),([0,2,2],1.1650855559259689e-2),([0,0,1],1.0356316052675282e-2),([0,0,2],7.767237039506465e-3),([0,0,0],5.178158026337637e-3)])

Look closely at the 6th and 7th elements.

([2,2,0],5.305253759268838e-2),([2,0,2],5.305253759268837e-2)

vs.

([2,0,2],5.3052537592688506e-2),([2,2,0],5.3052537592688506e-2)

The floating point numbers are so close that they are ordered in two different ways. The equality check at the end is fine with small differences because it uses ~==, but the sorting algorithm distinguishes them because it uses hard comparison. A simple solution is to sort again before testing. PR upcoming.

turion added a commit to turion/monad-bayes that referenced this issue Jun 12, 2023
@reubenharry
Copy link
Contributor

The stuff with pipes was added by me, partly to make MCMC smoother, partly to explore the stuff we're now exploring with dunai/Rhine. It's really not important, and if it's a pain point, can happily be removed. Sorry for the hassle there.

@turion turion closed this as completed in 060cbb4 Jun 13, 2023
turion added a commit that referenced this issue Jun 13, 2023
Fix #283, fixes flaky pipes test
@turion
Copy link
Collaborator Author

turion commented Jun 13, 2023

No worries. It's actually helpful to have two different implementations of something for comparison. So I think there is value in that test. But it would be great if we can tackle #260.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants