Experiment with Open Addressing HashMap and base it on Vector #1902

l0rinc · 2017-03-09T08:38:07Z

Since Hash Array Mapped Trie and Bit Mapped Trie are very similar conceptually (not even sure what the differences are), we could eliminate duplicate code if we would base the Maps on Vector, instead of recreating the array-emulating logic (where applicable, i.e. not for TreeMap).
(This is true even if we decide not to switch to Open Addressing.)

In addition to that, we could experiment with Open Addressing also - the current HashMaps use linking for collisions, right?

Pro:

Compared to chaining, linear probing has better cache locality, therefore it can be a lot faster
Could end up using a lot less memory

Contra:

It's difficult to implement
It cannot store null values (easily), might be problematic with primitive storage (a BitSet could help here, I think)
Delete will leave the keys and just change the values to null, might lead to memory leaks (I'm sure though that others have thought about this also)

The following resources could help us in designing it:

@ruslansennov has the expertise in designing the Maps, I know the Vector and @danieldietrich could mediate and contribute with his overall expertise. It could be a team-effort.
We could either create an experimental branch/package (or both), where we could collaborate on trying to achieve the fastest persistent map for the JVM - with automatic primitive support!

Related to: #1486

The text was updated successfully, but these errors were encountered:

danieldietrich · 2017-03-09T10:42:39Z

Fusing similar code and speeding up things + making them more memory efficient is alway a good idea.

I will create a feature branch that we can work on. The experimental HashMap could use the best of both worlds, BMT and HAMT. I.e. using the hashing stuff of HAMT and building a structure using BMT.

But caution: open-addressed impls may perform very very bad: http://stackoverflow.com/a/2557451/1110815

Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes. It gets very, very slow if the load factor approaches 1, because you end up usually having to search through many of the slots in the bucket array before you find either the key that you were looking for or an empty slot. Also, you can never have more elements in the hash table than there are entries in the bucket array.

l0rinc · 2017-03-09T13:22:00Z

It gets very, very slow if the load factor approaches 1

The mentioned article addresses this with a logarithmic worst-case probing

danieldietrich · 2017-03-09T13:24:19Z

Yes, but if HAMT is moderately slower (absolutely) for the good case and more stable in direction to the worst case I would prefer HAMT. We have to check that. Scala uses HAMT for HashMap. This might have a reason.

danieldietrich · 2017-03-09T22:47:28Z

@paplorinc I've created the open-addressing-hash-map branch.

ruslansennov · 2017-03-10T03:32:20Z

Hi guys.
I believe it will be very interesting experiment. I will try to suggest cons-benchmark when prototype will be ready.

ruslansennov · 2017-03-10T03:33:19Z

(hope I can) 😄

l0rinc · 2017-03-10T19:38:51Z

a few notes:

if we store Vector<Tuple<K, V>>s in the nodes, we cannot take advantage of primitive support
- unless we generate corresponding Tuples also
- or we store the K and V in separate Vectors (we might lose cache locality this way)
- we could store two arrays in a specialized BMT somehow...
we could use BitSet to store info about the values, e.g. was it deleted or not (is needed for linear probing)
- currently BitSet update is linear, maybe we should base it on Vector also, as suggested in Investigate BitSet optimizations #1486
we may want to use better hashing than a simple modulo

I will experiment with these, please share your findings also :)

danieldietrich · 2017-03-10T21:22:53Z

we could store two arrays in a specialized BMT somehow...

I would not change the BMT for the purpose of an Open Addressing (OA) HashMap. The past development showed that it is best to create specialized backing structures for each purpose.

On the other hand we do not want to have another backing collection, we want to reduce them. Changing BMT in order to make it work for Vector and HashMap will make it too complicated (-> separation of concerns). In the end no one would know which line of code is necessary for which other collection...

we could use BitSet to store info about the values

Maybe this will make it too heavy-weight. We should try to stay on the lowest level possible for our backing collections: primitive types, bit-operations, ...

we may want to use better hashing than a simple modulo

Yes, definitely. The quality of the hashing algorithm is the key to an outstanding hashed collection.

I will try to get more involved after solving the issues I'm currently on - too many parallel tasks 😳

danieldietrich · 2018-10-03T01:14:06Z

Closed because this issue is stale.
If this is an issue, we need a new investigation, which map impl is state of the art to that moment.

danieldietrich added design/refactoring/improvement «vavr-collection» labels Mar 9, 2017

danieldietrich added this to the 2.1.0 milestone Mar 9, 2017

danieldietrich modified the milestones: 3.0.0, 2.1.0 Mar 9, 2017

danieldietrich added the nice to have label Jul 6, 2017

danieldietrich removed the nice to have label Aug 19, 2017

danieldietrich added strategic and removed design/refactoring/improvement labels Nov 26, 2017

danieldietrich modified the milestones: vavr-1.0.0, vavr-1.1.0 Nov 26, 2017

danieldietrich closed this as completed Oct 3, 2018

danieldietrich added the reverted/closed/duplicate label Oct 3, 2018

danieldietrich removed the strategic label Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with Open Addressing HashMap and base it on Vector #1902

Experiment with Open Addressing HashMap and base it on Vector #1902

l0rinc commented Mar 9, 2017 •

edited

Loading

danieldietrich commented Mar 9, 2017

l0rinc commented Mar 9, 2017

danieldietrich commented Mar 9, 2017

danieldietrich commented Mar 9, 2017

ruslansennov commented Mar 10, 2017

ruslansennov commented Mar 10, 2017

l0rinc commented Mar 10, 2017

danieldietrich commented Mar 10, 2017

danieldietrich commented Oct 3, 2018

Experiment with Open Addressing HashMap and base it on Vector #1902

Experiment with Open Addressing HashMap and base it on Vector #1902

Comments

l0rinc commented Mar 9, 2017 • edited Loading

Pro:

Contra:

danieldietrich commented Mar 9, 2017

l0rinc commented Mar 9, 2017

danieldietrich commented Mar 9, 2017

danieldietrich commented Mar 9, 2017

ruslansennov commented Mar 10, 2017

ruslansennov commented Mar 10, 2017

l0rinc commented Mar 10, 2017

danieldietrich commented Mar 10, 2017

danieldietrich commented Oct 3, 2018

l0rinc commented Mar 9, 2017 •

edited

Loading