Don't leave a BitmapIndexed with only one child (issue #39). #46

wants to merge 1 commit into


None yet
2 participants

michalt commented Oct 9, 2012

In some cases filter and delete would leave a BitmapIndexed with only one child. But when the map was rebuild using fromList . toList then the internal representation of the tree was obviously slightly different. This broke equality since it assumes that equal maps are also structurally equal.


tibbe commented Oct 9, 2012

Please see 23b86f8 and issue #33 first to make sure we're not reintroducing that bug. If you want to collapse a level you need to make sure the right part of the hash (4 bits of the hash is used at each level) is used at the level the node is moved to.


michalt commented Oct 10, 2012

Seems like I should really read up some more on HAMTs. Sorry for the noise. I'll try to come up with something better (hopefully in the next few days).


tibbe commented Oct 10, 2012

@michalt I very much appreciate you trying to fix this.

Perhaps this helps you understand HAMTs a bit better: When we decide in which array position inside a BitmapIndexed node a subtree should go, we take a few bits of the hash and convert them into an Int in the range [0..16). The bits we use depend on which level of the tree we're at. This means that if we move a BitmapIndexed node to a new level, all array elements will need to have their indicies recomputed.

If we can implement equality without requiring that every function is strongly normalizing (i.e. guarantee that the same elements are always represented as the same tree) I think it will make the code base as a whole easier to work with, as there would be one less invariant to maintain.

You might want to look at the Clojure and/or Scala implementation of equality on HAMTs for inspiration of how to implement equality for our HAMT. I haven't had time to do so yet I'm afraid.


michalt commented Oct 10, 2012

I've skimmed through the paper on HAMTs and now I think that we can never move BitmapIndexed a level up. This is due to the fact that this is a trie, and if there is a BitmapIndexed with single child, it means that there is a number of elements that cannot be distinguished based on this part of the hash. Is that right?

If yes, then in the new patch, I'm careful to only move up Leaf. All tests pass (including the new regression one, @gbrsales: thanks!). What do you think?

Let me know if you prefer to change the equality instead (although the current solution, if correct, seems like the right thing to do, e.g., if we delete/filter out a lot from a HashMap we don't leave any unnecessary BitmapIndexed ).


tibbe commented Oct 10, 2012

@michalt You're right that we can never move a BitmapIndexed node up (we'd have to create a new one, with recomputed indices, which I don't think we want to do). I do think it makes sense to collapse Leaf and Collision nodes. I added issue #47 to track this issue. It's kinda separate from implementing equality, as it's a good idea regardless (e.g. the Scala implementation does this).

I would prefer to change the equality implementation instead of requiring the invariant that all equal maps have equal representations, if it all possible. It's all too easy to break that in new functions by not paying careful attention to collapsing subtrees. Also, I'm not sure that collapsing only leaves is enough to maintain the invariant. For example, you could collapse:

BitmapIndexed [BitmapIndexed [Leaf, Leaf]]

due to the second BitmapIndexed node has two children. However, this is a correct representation of the same tree:

BitmapIndexed [Leaf, Leaf]

To correctly collapse the tree to a canonical representation I think you'd need to look at the whole branch at once (which we don't do in e.g. delete or filter today).


michalt commented Oct 11, 2012

@tibbe I think I'm missing something. How could we collapse:

BitmapIndexed [BitmapIndexed [Leaf, Leaf]]


BitmapIndexed [Leaf, Leaf]


My current understanding of HAMTs is that in this example the two leafs do not differ in the first 4 bits of the hash. This means that both index (using the 4 bits of hash as an integer) into the same bit of the bitmask of the first BitmapIndexed. And so to tell them apart we introduce a second BitmapIndexed. Wouldn't collapsing here require somehow recording that 4 bits of hash should be skipped on subsequent lookup?

I see the same problem with Collision, i.e., the hashes of the two (or more) elements don't differ at all. So to get the indexing right, wouldn't we still have to keep all BitmapIndexed nodes above?

Note that this is quite different if we have just a Leaf -- we can always bring it up if it's the only child.


tibbe commented Oct 11, 2012

My thinking (which might be incorrect) is that we can get to the state in my example by starting with something like

BitmapIndexed [Leaf, BitmapIndexed [Leaf, Leaf]]

and then delete a Leaf now leaving the root BitmapIndexed with only one child, that we could collapse if the two "bottom" leaves don't collide in their 4 first hash bits. I might be wrong though.


michalt commented Oct 13, 2012

I think a nested BitmapIndexed is only created if the 4 bits collide. Otherwise they would index to separate children of the topmost BitmapIndexed. IOW it works like a trie and the two leafs share a common prefix (which in this case is the 4 bits of the hash; I guess one can think of this as a trie with alphabet of size 2^4).

To convince myself I've done a small experiment (on a 64 bit machine, so Int is 64 bits wide) and created a map with the keys 2^61 and 2^62 (hash for Int is identity). This results with a HashMap consisting of a chain of 16 BitmapIndexed, which makes sense since only the bottom one can distinguish between the hashes (we consume the hash starting with least significant bits).


tibbe commented Oct 16, 2012

@michalt You might be right. I'd need to sit down and think hard about it to make sure this is the case.

Can we try to split them problem in two:

  • Is there a way to write Eq without assuming that semantic equality implies structural equality?
  • Does collapsing BitmapIndexed nodes containing a single Leaf or Collision node make the structure have a one-to-one structure to semantic content mapping (i.e. the issue we're discussing now)?

michalt commented Oct 16, 2012

@tibbe For the first question I think the answer is yes -- I should have a patch tomorrow (I don't have time to test right now, sorry). As for the second one, I'm not entirely sure, but right now I believe that collapsing BitmapIndexed with single Leaf (not Collision!) should be both safe and enough to have a equality based on structure.


michalt commented Oct 18, 2012

@tibbe You're right -- we should also collapse BitmapIndexed with a single Collision (I thought that it might require some changes in the code, but on closer look it seems it should work just fine). I'll update the patch.


tibbe commented Oct 18, 2012

I merged #50 instead.

tibbe closed this Oct 18, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment