New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results from pure program using Data.Parallel #147

Closed
bartavelle opened this Issue Mar 21, 2017 · 44 comments

Comments

Projects
None yet
9 participants
@bartavelle

bartavelle commented Mar 21, 2017

This is unfortunately a bit of a bad bug report, because I have a whole program ...

With the following commit: https://github.com/bartavelle/gamebooksolver/tree/4c7028424ccf8b7309846d5c7d28ab7067d752af

Reproduction:

stack build
.stack-work/install/x86_64-linux/lts-8.4/8.0.2/bin/gamebooksolver-solvebook02 > /tmp/good
for i in `seq 1 50`; do time .stack-work/install/x86_64-linux/lts-8.4/8.0.2/bin/gamebooksolver-solvebook02 +RTS -N8 > /tmp/new && diff -u /tmp/good /tmp/new; done;

This will sometimes give a different result (been reproduced on 2 computers by me, and by two #haskell users).

Data.HashMap.Strict is only ever used here, and when switching to Data.Map.Strict , I can't seem to reproduce the problem.

The program uses quite a bit of memory (~ 2GB in the current settings), data-memocombinators and parallel. It only happens when using -N8.

@bartavelle

This comment has been minimized.

bartavelle commented Mar 21, 2017

(I should have meant it doesn't seem to happen with -N1, haven't tried other values)

@tibbe

This comment has been minimized.

Owner

tibbe commented Mar 22, 2017

Ouch, that's a nasty bug. I don't know what to do about this unless we have a smaller test case. :/

@bartavelle

This comment has been minimized.

bartavelle commented Mar 22, 2017

Yeah, I am sorry for this bug report. I don't think it would be easy to get a much smaller test case, though. I suspect it has to do with a combination of multithreading, data-memocombinators and unordered-containers. I might take a stab at it with a simpler game later.

@pacak

This comment has been minimized.

pacak commented Apr 19, 2017

Problematic code in gamebooksolver looks like this:

regroup :: (Eq a, Hashable a) => [(a, Rational)] -> [(a, Rational)]
regroup = HM.toList . HM.fromListWith (+)

One of the fixes looks like this:

regroup :: (Eq a, Hashable a, Ord a) => [(a, Rational)] -> [(a, Rational)]
regroup = sort . HM.toList . HM.fromListWith (+)

Map.toList promises to return a list in deterministic order, HashMap.toList makes no such promises. regroup gets called different number of time every run with different number of elements so resulting list gets shuffled. As soon as you make this order deterministic problem goes away.

I suspect ticket can be closed.

@cgibbard

This comment has been minimized.

cgibbard commented Apr 19, 2017

I'm not sure this is quite the problem. So long as the order in which toList produces the elements is actually a pure function of the contents of the HashMap, it should be okay. If it's producing a different order on the same HashMap in different runs of the program, that's a serious issue.

This bug appeared with multiple runs of the very same program which didn't have any input effects, and only appeared when run with pure parallelism turned on. The result of a program should be the same when using par vs. when not using it, so I'm fairly sure it has something to do with thread-unsafety of HashMap.insert or one of the other functions which is using reallyUnsafePtrEquality# internally.

@pacak

This comment has been minimized.

pacak commented Apr 19, 2017

If it's producing a different order on the same HashMap in different runs of the program, that's a serious issue.

Documentation states

The order of its elements is unspecified.

So even if order is different between runs - that's totally OK. But that's not the problem because contents of HashMap is different every time. Try using this code:

regroup :: (Eq a, Hashable a) => [(a, Rational)] -> [(a, Rational)]
regroup = HM.toList . (\h -> trace (show $ HM.size h) h) . HM.fromListWith (+)

and store results in a file. Sort contents of those files so order is deterministic and compare them for example with md5sum - HashMaps can't be equal if they contain different number of elements.

@pacak

This comment has been minimized.

pacak commented Apr 20, 2017

Map sizes are also non-deterministic with Maps:

% cat 1.map | sort | md5sum
68c8b086c4ead0cf417d83acf5e1d724  -
% cat 2.map | sort | md5sum
d7cb9a0d8db0b9e6f4750c24399d2b32  -
% cat 3.map | sort | md5sum
66bf9213d3cac0e13d67e05cd7c9bc95  -

so with Map it works only because it produces a sorted list.

@pacak

This comment has been minimized.

pacak commented Apr 20, 2017

So long as the order in which toList produces the elements is actually a pure function of the contents of the HashMap, it should be okay.

That's not true btw. toList is a pure function of shape of the HashMap and if you have collisions shape depends on insertion order.

Consider simple implementation of HashMap as [a] with hash function :: a -> Int. If there are several elements with the same hash you can't store them in a single cell so you have to either use approach like Cuckoo hashing (in that case you need to decide which element stays in cell and which moves to a different one) or you have to change elements to be list themselves do lookup based on Eq instance. But since only constraints you have are Eq and Hashable - you can't deterministically order them other than by insertion order.

HashMap seems to store collisions as lists

    | Collision !Hash !(A.Array (Leaf k v))

and when inserting

    go h k x s t@(Collision hy v)
        | h == hy   = Collision h (updateOrSnocWith const k x v)

just appending them to the end of the list.

    go !k v !ary !i !n
        | i >= n = A.run $ do
            -- Not found, append to the end.
            mary <- A.new_ (n + 1)
            A.copy ary 0 mary 0 n
            A.write mary n (L k v)
            return mary

If it's producing a different order on the same HashMap in different runs of the program, that's a serious issue.

Actually that's the other way around, in applications where you handle user supplied data you might want to have different shape (and different order) in order to avoid collision DoS attack.

@bartavelle

This comment has been minimized.

bartavelle commented Apr 23, 2017

The problematic code isn't about changing order of the toList tuples. I am very well aware they can vary between runs. I don't use the toList . fromList idiom to order the results, but to group them. The only ordering that is important to this application is sorting by values the result of toList.

@pacak

This comment has been minimized.

pacak commented Apr 23, 2017

I don't use the toList . fromList idiom to order the results, but to group them.

Right, it's fromListWith (+), but sorting resulting list seems to fix the problem as well. I tried forcing list - both before and after - forcing list before helps - I assume that destroys most of the concurrency, forcing list after - does not.

@bartavelle

This comment has been minimized.

bartavelle commented Apr 23, 2017

OTOH, reverse . M.toList . M.fromListWith (+) should work just as well as M.toList . M.fromListWith (+).

@pacak

This comment has been minimized.

pacak commented Apr 23, 2017

OTOH, reverse . M.toList . M.fromListWith (+) should work just as well as M.toList . M.fromListWith (+).

That seems to be working. Interesting. I'll try a few more things later then.

@pacak

This comment has been minimized.

pacak commented Apr 24, 2017

I managed to reduce sample to this - HashMap indeed produces strange results:

regroup :: (Hashable a, Eq) => [(a, Rational)] -> [(a, Rational)]
regroup xs =
    let xs' = HM.toList $ HM.fromListWith (+) xs
        !s' = sum (map snd xs')
        s  = sum (map snd xs)
     in if s' /= s
            then error $ "very bad" ++ show ("s'", s', "s", s)
            else xs'

In some cases s' is bigger than s, in some cases s' is smaller.

I'll try to minimize the test case.

@pacak

This comment has been minimized.

pacak commented Apr 25, 2017

In minimized test case bug is reproduceable ghc 7.10 and 8.0, but not in 8.2rc. Hmmm...

Actually no, -O0 makes it reproduceable in ghc8.2rc as well

@pacak

This comment has been minimized.

pacak commented Apr 25, 2017

This particular bug is fixed after changing unsafeInsertWith, BitmapIndexed branch:

-            A.unsafeUpdateM ary i st'
-            return t
+            let  ary' = A.update ary i st'
+            return $ BitmapIndexed b ary'

Still trying to figure out why.

@liyang

This comment has been minimized.

liyang commented Apr 26, 2017

There is now a GHC ticket, and the test case there has been moreover minimised massively.

@bartavelle

This comment has been minimized.

bartavelle commented Apr 26, 2017

If that's really a GHC issue, then I suppose this ticket should be closed.

@pacak

This comment has been minimized.

pacak commented Apr 26, 2017

If.

@pacak

This comment has been minimized.

pacak commented Apr 27, 2017

While it's not clear precisely what closure we are entering multiple times, the fix is clear: ensure that unordered-containers is compiled with -feager-blackholing. However, we should also (at very least) document this caveat better. The users-guide current advertises -feager-blackholing as a optimization to avoid redundant computation in parallel programs. However, in this case that it's absolutely critical for correctness.

@bgamari

This comment has been minimized.

Contributor

bgamari commented Apr 27, 2017

So the issue appears to be lazy blackholing. This is an optimization used by GHC to reduce the cost of blackholing closures. However, in the case of unordered-containers it compromises correctness as it allows multiple entries into unsafeInsertWith, which internally relies on mutability.

Clearly this needs to be documented a bit better in the users guide, but the fix is clear: ensure that unordered-containers is compiled with -feager-blackholing.

bgamari added a commit to bgamari/unordered-containers that referenced this issue Apr 27, 2017

Enable eager blackholing
While usually this is merely an optimization in parallel settings, in
our case it's absolutely essential for correctness due to the use of
internal mutability. See tibbe#147.

bgamari added a commit to bgamari/unordered-containers that referenced this issue Apr 27, 2017

Enable eager blackholing
While usually this is merely an optimization in parallel settings, in
our case it's absolutely essential for correctness due to the use of
internal mutability. See tibbe#147.

@tibbe tibbe closed this in #149 Apr 27, 2017

@tibbe

This comment has been minimized.

Owner

tibbe commented Apr 27, 2017

It sounds like eager blackholing is a broken optimization. :(

@treeowl

This comment has been minimized.

Collaborator

treeowl commented Apr 27, 2017

@tibbe lazy blackholing is a perfectly good optimization, but requires great care around unsafe IO-like operations. I don't really think using -feager-blackholing is the right solution. Better, I suspect, to consider each unsafe site carefully and see if it should use non-dupable unsafePerformIO, unsafeInterleaveIO, etc., or explicit noDuplicate, or some explicit synchronization (MVar or similar).

@tibbe

This comment has been minimized.

Owner

tibbe commented Apr 27, 2017

It would be worth establishing if the unordered-containers code is actually wrong.

@treeowl note that this is all in ST, no IO.

@treeowl

This comment has been minimized.

Collaborator

treeowl commented Apr 27, 2017

@bgamari

This comment has been minimized.

Contributor

bgamari commented Apr 27, 2017

For the record, I don't believe that -feager-blackholing is the whole solution; it merely shrinks the window in which two threads could enter the closure in question. Off hand, the only way we can close the window entirely is noDuplicate#, which seems extremely expensive to put in an inner loop. See bgamari@bf799b2 for my take on the noDuplicate# approach.

@bgamari

This comment has been minimized.

Contributor

bgamari commented Apr 30, 2017

For the record, I'm not confident that we can consider this closed quite yet.

@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

I would agree that this should be reopened. Just to throw out another option for fixing this, unordered-containers could internally make use of a genuinely mutable hashmap with the same structure as the immutable one provided to the end user. Then there would just be one in-place freeze at the end, which I think would have logarithmic cost. This seems like the most correct solution although it's certainly more complicated than the current approach.

@pacak

This comment has been minimized.

pacak commented Jun 14, 2017

Then there would just be one in-place freeze at the end, which I think would have logarithmic cost.

https://ghc.haskell.org/trac/ghc/ticket/13615#comment:37

I've written a ​seriously legitimate modification of Data.HashMap.Strict.fromListWith that Ben indicates still demonstrates the problem. This version uses a separate type of mutable hashmap that holds MArrays. When it's done, it traverses the tree and (unsafely) freezes all the arrays. Unless I've goofed up, there should be no doubt that this version should work correctly.

@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

Ah. The branch you are referring to is exactly what I had in mind. That's surprising that it exhibits the same problem.

@pacak

This comment has been minimized.

pacak commented Jun 14, 2017

Ah. The branch you are referring to is exactly what I had in mind.

That's what I wanted to write myself as well just never got around doing it. The bug is still there and the only safe solution I can suggest right now is to implement fromList using fold.

@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

I would like to see the test that @bgamari is running so that I can confirm this on my machine.

@pacak

This comment has been minimized.

pacak commented Jun 14, 2017

I would like to see the test that @bgamari is running so that I can confirm this on my machine.

It's in ticket description.

git clone https://github.com/pacak/cuddly-bassoon && cd cuddly-bassoon
cabal new-build
dist-newstyle/build/gamebooksolver-0.1.0.0/build/gamebooksolver-solvebook02/gamebooksolver-solvebook02
@treeowl

This comment has been minimized.

Collaborator

treeowl commented Jun 14, 2017

@pacak

This comment has been minimized.

pacak commented Jun 14, 2017

Everything is in cuddly-bassoon. clone, compile and run.

@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

I can confirm that I also get problems with the super-safe version. It's always the "Those are expected to be equal". I'm using a GHC 8.2 release candidate and cabal new-build.

@pacak

This comment has been minimized.

pacak commented Jun 14, 2017

Expected to be equal, but not "WAT"?

     in if s' /= s
            then if show s' == show s
                    then error "WAT????"
                    else error $ "Those are expected to be equal" ++ show (s', s)
            else xs'
@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

Yeah, I never get the "WAT???" message:

$ while true ; do echo "RUN" && ./dist-newstyle/build/gamebooksolver-0.1.0.0/build/gamebooksolver-solvebook02/gamebooksolver-solvebook02 ; done;
RUN
RUN
RUN
RUN
gamebooksolver-solvebook02: Those are expected to be equal(168,128)
CallStack (from HasCallStack):
  error, called at src/Main.hs:46:26 in main:Main
RUN
RUN
RUN
gamebooksolver-solvebook02: Those are expected to be equal(66,62)
CallStack (from HasCallStack):
  error, called at src/Main.hs:46:26 in main:Main
RUN
RUN
RUN
gamebooksolver-solvebook02: Those are expected to be equal(171,167)
CallStack (from HasCallStack):
  error, called at src/Main.hs:46:26 in main:Main
RUN
RUN
RUN
@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

Found part of problem. I've put up a fork of cuddly bassoon. I've made three changes from the original code @pacak put up:

  1. Use @treeowl unordered-containers (super safe version)
  2. Turn on debug flag in unordered-containers
  3. Disable threaded in cuddly-bassoon

I see this happening every time I run the test (again, this is without any concurrency involved):

gamebooksolver-solvebook02: Data.HashMap.Array.copyM: src: bounds error, offset -1, length 3
CallStack (from HasCallStack):
  error, called at ./Data/HashMap/Array.hs:266:74 in unordered-containers-0.9.10-inplace:Data.HashMap.Array
@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

Nevermind, I think the check in copyM is actually wrong.

@andrewthad

This comment has been minimized.

Contributor

andrewthad commented Jun 14, 2017

I've been looking at this for a while now, and it's really weird that @treeowl's branch has the same problem, I've slowly tried stripping out various in-place updates and replacing them by first making a copy of the structure and mutating that instead, and it doesn't help at all. The fact the the code runs fine in a single-threaded setting but not in a multi-threaded setting hints that in the window before a blackhole is committed, two threads end up sharing a mutable variable. However, the code doesn't use any functions that are known to cause this. No lazy ST, no unsafeDupablePerformIO, just runST. The parts of hashable that get touched don't use them either.

@treeowl

This comment has been minimized.

Collaborator

treeowl commented Jun 14, 2017

@bgamari

This comment has been minimized.

Contributor

bgamari commented Jun 14, 2017

@andrewthad have a look at #13615 comment 36; I have a theory for what happens here but have been waiting for further feedback from @simonmar to confirm that it's plausible.

@mpickering

This comment has been minimized.

mpickering commented Jul 1, 2017

@bgamari submitted a patch which fixes this issue. https://phabricator.haskell.org/D3695

@bgamari

This comment has been minimized.

Contributor

bgamari commented Jul 1, 2017

This fix will be present in 8.2 (and in fact, is one of the reasons why 8.2 is so terribly late). The -feager-blackholing fix that we merged here earlier is indeed only a workaround. Eager blackholing does not use an atomic compare-and-swap and consequently there is still the possibility (albeit small) for two threads to enter a non-duplicable thunk. Unfortunately, there is really no good way to truly fix this in earlier GHC releases as even noDuplicate# can be defeated by this issue.

I suggest that drop -feager-blackholing when building with GHC 8.2 but retain it for all earlier compilers to minimize the chance for this bug to rear its ugly head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment