Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Don't leave a BitmapIndexed with only one child (issue #39). #46

Closed
wants to merge 1 commit into from

2 participants

@michalt

In some cases filter and delete would leave a BitmapIndexed with only one child. But when the map was rebuild using fromList . toList then the internal representation of the tree was obviously slightly different. This broke equality since it assumes that equal maps are also structurally equal.

@tibbe
Owner

Please see 23b86f8 and issue #33 first to make sure we're not reintroducing that bug. If you want to collapse a level you need to make sure the right part of the hash (4 bits of the hash is used at each level) is used at the level the node is moved to.

@michalt

Seems like I should really read up some more on HAMTs. Sorry for the noise. I'll try to come up with something better (hopefully in the next few days).

@tibbe
Owner

@michalt I very much appreciate you trying to fix this.

Perhaps this helps you understand HAMTs a bit better: When we decide in which array position inside a BitmapIndexed node a subtree should go, we take a few bits of the hash and convert them into an Int in the range [0..16). The bits we use depend on which level of the tree we're at. This means that if we move a BitmapIndexed node to a new level, all array elements will need to have their indicies recomputed.

If we can implement equality without requiring that every function is strongly normalizing (i.e. guarantee that the same elements are always represented as the same tree) I think it will make the code base as a whole easier to work with, as there would be one less invariant to maintain.

You might want to look at the Clojure and/or Scala implementation of equality on HAMTs for inspiration of how to implement equality for our HAMT. I haven't had time to do so yet I'm afraid.

@michalt

I've skimmed through the paper on HAMTs and now I think that we can never move BitmapIndexed a level up. This is due to the fact that this is a trie, and if there is a BitmapIndexed with single child, it means that there is a number of elements that cannot be distinguished based on this part of the hash. Is that right?

If yes, then in the new patch, I'm careful to only move up Leaf. All tests pass (including the new regression one, @gbrsales: thanks!). What do you think?

Let me know if you prefer to change the equality instead (although the current solution, if correct, seems like the right thing to do, e.g., if we delete/filter out a lot from a HashMap we don't leave any unnecessary BitmapIndexed ).

@tibbe
Owner

@michalt You're right that we can never move a BitmapIndexed node up (we'd have to create a new one, with recomputed indices, which I don't think we want to do). I do think it makes sense to collapse Leaf and Collision nodes. I added issue #47 to track this issue. It's kinda separate from implementing equality, as it's a good idea regardless (e.g. the Scala implementation does this).

I would prefer to change the equality implementation instead of requiring the invariant that all equal maps have equal representations, if it all possible. It's all too easy to break that in new functions by not paying careful attention to collapsing subtrees. Also, I'm not sure that collapsing only leaves is enough to maintain the invariant. For example, you could collapse:

BitmapIndexed [BitmapIndexed [Leaf, Leaf]]

due to the second BitmapIndexed node has two children. However, this is a correct representation of the same tree:

BitmapIndexed [Leaf, Leaf]

To correctly collapse the tree to a canonical representation I think you'd need to look at the whole branch at once (which we don't do in e.g. delete or filter today).

@michalt

@tibbe I think I'm missing something. How could we collapse:

BitmapIndexed [BitmapIndexed [Leaf, Leaf]]

into

BitmapIndexed [Leaf, Leaf]

?

My current understanding of HAMTs is that in this example the two leafs do not differ in the first 4 bits of the hash. This means that both index (using the 4 bits of hash as an integer) into the same bit of the bitmask of the first BitmapIndexed. And so to tell them apart we introduce a second BitmapIndexed. Wouldn't collapsing here require somehow recording that 4 bits of hash should be skipped on subsequent lookup?

I see the same problem with Collision, i.e., the hashes of the two (or more) elements don't differ at all. So to get the indexing right, wouldn't we still have to keep all BitmapIndexed nodes above?

Note that this is quite different if we have just a Leaf -- we can always bring it up if it's the only child.

@tibbe
Owner

My thinking (which might be incorrect) is that we can get to the state in my example by starting with something like

BitmapIndexed [Leaf, BitmapIndexed [Leaf, Leaf]]

and then delete a Leaf now leaving the root BitmapIndexed with only one child, that we could collapse if the two "bottom" leaves don't collide in their 4 first hash bits. I might be wrong though.

@michalt

I think a nested BitmapIndexed is only created if the 4 bits collide. Otherwise they would index to separate children of the topmost BitmapIndexed. IOW it works like a trie and the two leafs share a common prefix (which in this case is the 4 bits of the hash; I guess one can think of this as a trie with alphabet of size 2^4).

To convince myself I've done a small experiment (on a 64 bit machine, so Int is 64 bits wide) and created a map with the keys 2^61 and 2^62 (hash for Int is identity). This results with a HashMap consisting of a chain of 16 BitmapIndexed, which makes sense since only the bottom one can distinguish between the hashes (we consume the hash starting with least significant bits).

@tibbe
Owner

@michalt You might be right. I'd need to sit down and think hard about it to make sure this is the case.

Can we try to split them problem in two:

  • Is there a way to write Eq without assuming that semantic equality implies structural equality?
  • Does collapsing BitmapIndexed nodes containing a single Leaf or Collision node make the structure have a one-to-one structure to semantic content mapping (i.e. the issue we're discussing now)?
@michalt

@tibbe For the first question I think the answer is yes -- I should have a patch tomorrow (I don't have time to test right now, sorry). As for the second one, I'm not entirely sure, but right now I believe that collapsing BitmapIndexed with single Leaf (not Collision!) should be both safe and enough to have a equality based on structure.

@michalt

@tibbe You're right -- we should also collapse BitmapIndexed with a single Collision (I thought that it might require some changes in the code, but on closer look it seems it should work just fine). I'll update the patch.

@tibbe
Owner

I merged #50 instead.

@tibbe tibbe closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 14 additions and 1 deletion.
  1. +14 −1 Data/HashMap/Base.hs
View
15 Data/HashMap/Base.hs
@@ -479,7 +479,15 @@ delete k0 m0 = go h0 k0 0 m0
then t
else case st' of
Empty | A.length ary == 1 -> Empty
- | otherwise -> BitmapIndexed (b .&. complement m) (A.delete ary i)
+ | A.length ary == 2 ->
+ case (i, A.index ary 0, A.index ary 1) of
+ (0, _, l@(Leaf _ _)) -> l
+ (1, l@(Leaf _ _), _) -> l
+ _ -> bIndexed
+ | otherwise -> bIndexed
+ where
+ bIndexed = BitmapIndexed (b .&. complement m) (A.delete ary i)
+ l@(Leaf _ _) | A.length ary == 1 -> l
_ -> BitmapIndexed b (A.update ary i st')
where m = mask h s
i = sparseIndex b m
@@ -826,6 +834,11 @@ filterWithKey pred = go
step !ary !mary !b i !j !bi n
| i >= n = case j of
0 -> return Empty
+ 1 -> do
+ ch <- A.read mary 0
+ case ch of
+ l@(Leaf _ _) -> return l
+ _ -> BitmapIndexed b <$> trim mary 1
_ -> do
ary2 <- trim mary j
return $! if j == maxChildren
Something went wrong with that request. Please try again.