Changed `Vector` from `HAMT` to `BMT` #1504

paplorinc · 2016-08-17T18:14:34Z

Fixes: Implement Vector as bit-mapped vector trie #568.
Dependency for: #1449, #1522 and #1557.

This is the non-optimized version that I only compared to Scala for now, feel free to compare it against:

Vector optimizations Vector optimizations #1522
Automatic, internal primitive storage support for Vector Automatic, internal primitive storage support for Vector #1557.

A bit-mapped trie is a very wide and shallow tree (for integer indices the depth will be ≤6).
Each node has a maximum of 32 children (configurable).
Access to a given position is done by converting the index to a base 32 number and using each digit to descend down the tree.
Modifying the tree is done similarly, but along the way the path is copied, returning a new root every time.
Append inserts in the last leaf, or if the tree is full from the right, it adds another layer on top of it (the old root will be the first of the new one).
Prepend is done similarly, but an offset is needed, because adding a new top node (where the current root would be the last node of the new root) shifts the indices by half of the current tree's full size. The offset shifts them back to the correct index.
Slice is done by trimming the path from the root and discarding any leading/trailing values in effectively constant time (without memory leak, as in Java/Clojure).

Memory:

for 32768 elements
 `scala.collection.immutable.Vector` uses `659 KB` (`20.7 KB` overhead, `0.6` bytes overhead per element)
 `javaslang.collection.Vector` uses `659 KB` (`20.6 KB` overhead, `0.6` bytes overhead per element)

for 1024 elements
 `scala.collection.immutable.Vector` uses `19.3 KB` (`672 bytes` overhead, `0.7` bytes overhead per element)
 `javaslang.collection.Vector` uses `19.3 KB` (`664 bytes` overhead, `0.6` bytes overhead per element)

for 32 elements
 `scala.collection.immutable.Vector` uses `536 bytes` (`32 bytes` overhead, `1.0` bytes overhead per element)
 `javaslang.collection.Vector` uses `528 bytes` (`24 bytes` overhead, `0.8` bytes overhead per element)

Speed:

Operation Ratio                                 32   1024  32768 
Create    slang_persistent/scala_persistent  0.33×  0.11×  0.07×
Head      slang_persistent/scala_persistent  0.85×  0.64×  0.49×
Tail      slang_persistent/scala_persistent  1.04×  1.06×  0.96×
Get       slang_persistent/scala_persistent  1.22×  0.90×  0.85×
Update    slang_persistent/scala_persistent  1.04×  1.14×  1.24×
Prepend   slang_persistent/scala_persistent  0.64×  0.58×  0.43×
Append    slang_persistent/scala_persistent  0.94×  0.37×  0.29×
GroupBy   slang_persistent/scala_persistent  0.96×  0.31×  0.22×
Slice     slang_persistent/scala_persistent  0.79×  1.23×  1.64×
Iterate    slang_persistent/scala_persistent 1.72×  1.03×  0.95×

i.e. for small number of elements it often has the same speed, for larger ones it's often 2-3x slower.
Compared to the optimized version this is roughly:

create: ~30x slower
head: ~2x slower
tail: ~4x slower
get: ~0.7x slower (i.e. ~1.4x faster)
update: ~0.95x slower (i.e. ~1.05x faster)
prepend: ~2.5x slower
append: ~2.5x slower
groupby: ~3.5x slower
slice: ~3.5x slower
iterate: ~1.1x slower

codecov-io · 2016-08-17T18:26:35Z

Current coverage is 96.98% (diff: 90.65%)

Merging #1504 into master will decrease coverage by 0.04%

@@             master      #1504   diff @@
==========================================
  Files            87         88     +1   
  Lines         10541      10638    +97   
  Methods           0          0          
  Messages          0          0          
  Branches       1798       1814    +16   
==========================================
+ Hits          10228      10317    +89   
+ Misses          183        179     -4   
- Partials        130        142    +12

Powered by Codecov. Last update 8360a92...9205299

danieldietrich · 2016-08-18T00:44:27Z

javaslang/src/main/java/javaslang/collection/Vector.java

- * @author Ruslan Sennov
- * @since 2.0.0
+ * @author Pap Lőrinc
+ * @since 3.0.0


maybe it will make it to 2.1.0, the interface did not change (only methods were added)

paplorinc · 2016-08-18T19:12:28Z

Force pushed a new version.
Thanks to @danieldietrich, I managed to have a single depth modification method, extended with some lambdas. This way most methods are unified now!

We don't need to go down the tree and back again (saves time), but the lambdas slow it down a bit, so the end performance is basically the same as before (updated it in the description, though).

Now the non-optimized BMT is only 150 lines :).

@danieldietrich, @ruslansennov, @zsolt-donca, waiting eagerly for your reviews :)

paplorinc · 2016-08-18T19:15:02Z

javaslang/src/main/java/javaslang/collection/Vector.java

+            return modifyLeaf(array, depthShift, index, node, leaf);
+        }
+
+        private static Object[] modifyLeaf(Object[] array, int depthShift, int index, NodeModifier nodeModifier, NodeModifier leafModifier) {


prepend, append, update, drop and take now use this iterative method instead

I think we can save lambda instances by creating singletons. I'm not sure if the curried NODE version helps...

// names can be changed private static final NodeModifier NODE = (o, i) -> copy(o, i + 1); private static final Function<Object, NodeModifier> LEAF = element -> (o, i) -> copyUpdate(o, i, element); // ... private static Object[] modifyLeafValue(Object[] array, int depthShift, int index, Object element) { return modifyLeaf(array, depthShift, index, NODE, LEAF.apply(element)); }

Update: If the above makes sense, we don't need modifyLeafValue() at all and can use modifyLeaf() directly.

There's a measurable difference this way 💃
Thanks @danieldietrich!

before:

Operation Ratio 32 1024 32768 Update slang_persistent/scala_persistent 1.03× 0.93× 1.17× Prepend slang_persistent/scala_persistent 0.64× 0.58× 0.44× Append slang_persistent/scala_persistent 0.84× 0.33× 0.26× Slice slang_persistent/scala_persistent 0.74× 1.19× 1.47×

after:

Operation Ratio 32 1024 32768 Update slang_persistent/scala_persistent 1.04× 1.14× 1.24× Prepend slang_persistent/scala_persistent 0.64× 0.58× 0.43× Append slang_persistent/scala_persistent 0.94× 0.37× 0.29× Slice slang_persistent/scala_persistent 0.79× 1.23× 1.64×

Mmhh, interesting - do I interpret it right that creating more instances performs better?

How frequently modifyLeafValue() is called on an operation, round about? Zero or Once / logarithmic times?

I guess that before/after performs equal (modulo measurement error). Because we turned GC off, we don't see the effect of creating instances. However, if the method is not called often, it will not matter.

It's called once per operation, e.g. one update will call modifyLeaf once.
But don't worry about GC, through escape analysis short-lived lambdas might end up on the stack, instead of the heap (https://dzone.com/articles/java-lambdas-and-low-latency).

sounds good!

paplorinc · 2016-08-19T08:30:54Z

javaslang/src/main/java/javaslang/collection/Arrays.java

+
+    /** clone the source and set the value at the given position */
+    static Object[] copyUpdate(Object arrayObject, int index, Object element) {
+        final Object[] array = (Object[]) arrayObject;


as stated in the above JavaDoc, untyped Object array, to get rid of call-site casts (and make room for non-Object based arrays also)

danieldietrich · 2016-08-19T08:55:21Z

Build fails with

[ERROR] /home/travis/build/javaslang/javaslang/javaslang-benchmark/src/test/java/javaslang/collection/CharSeqBenchmark.java:[54,98] method equals in class java.lang.Object cannot be applied to given types;
  required: java.lang.Object
  found: char[],char[]
  reason: actual and formal argument lists differ in length

paplorinc · 2016-08-19T12:43:31Z

javaslang/src/main/java/javaslang/collection/BitMappedTrie.java

+        return branchCount * fullBranchSize;
+    }
+
+    BitMappedTrie<T> prependLeaf(T leading) {


should be optimized by appending whole arrays instead...

Why is this method called prependLeaf as opposed to just prepend ? Does it make sense to have, or can you imagine, an operation in which we prepend anyplace else but to the first leaf?

Until now you could only append full leafs (proposed here as optimization) and didn't see the point in renaming it back and forth.
Renamed both :)

ruslansennov · 2016-08-19T14:49:00Z

Sorry guys, I can't take part in discussion, too busy until Monday.
I'll review this PR later (if it will be actual)

danieldietrich · 2016-08-19T14:52:24Z

javaslang/src/test/java/javaslang/collection/VectorPropertyTest.java

+                assertAreEqual(actual, null, constant(actual), expected);
+            }
+
+            System.out.println("Depth " + depth + " ok!");


we should remove all System.out calls (and assert if necessary)

Sure, it was mainly for debugging (I've seen that many other Javaslang tests print out content, but I agree that they shouldn't) :)

we will clean up the others also (in another PR)

(I will keep System.out.println("using seed " + seed); as it's the only way to reproduce it in case of failure)

danieldietrich · 2016-08-19T14:54:18Z

For me it looks good so far. I can pull it in if the System.outs are removed...

paplorinc · 2016-08-20T15:08:29Z

javaslang-benchmark/src/test/java/javaslang/collection/VectorBenchmark.java

@@ -578,6 +581,7 @@ public void slang_persistent(Blackhole bh) {
            for (int i = 1; !values.isEmpty(); i++) {
                values = values.slice(1, values.size());
                values = values.slice(0, values.size() - 1);
+                assert sizeOf(values.trie) < (branchingFactor() * sizeOf(copyRange(ELEMENTS, i, javaMutable.size() - i))); /* detect memory leak */


added leak detection, i.e. if slice only keeps a few elements from a lot, its size should be small

danieldietrich · 2016-08-21T23:33:54Z

We can optimize the partition method, which currently first fills an ArrayList, then calls ofAll(Iterable), which internally calls trie.append:

    @Override
    public Tuple2<Vector<T>, Vector<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        final ArrayList<T> left = new ArrayList<>(), right = new ArrayList<>();
        for (int i = 0; i < length(); i++) {
            final T t = get(i);
            (predicate.test(t) ? left : right).add(t);
        }
        return Tuple.of(ofAll(left), ofAll(right));
    }

I think we could save CPU + Mem by calling trie.append directly:

    @Override
    public Tuple2<Vector<T>, Vector<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        final BitMappedTrie<T> left = BitMappedTrie.empty(), right = BitMappedTrie.empty();
        for (T t : this) {
            if (predicate.test(t)) {
                left = left.append(t);
            } else {
                right = right.append(t);
            }
        }
        return Tuple.of(ofAll(left), ofAll(right));
    }

We can also iterate elements instead of calling get for each element.

Question: Does the JVM optimizes length() calls here? It can't know that Vector is immutable and the value never changes...

for (int i = 0; i < length(); i++) {

Wouldn't it be better to first store length() in a variable and then check against that?

final int length = length();
for (int i = 0; i < length; i++) {

paplorinc · 2016-08-22T08:44:19Z

I think we could save CPU + Mem by calling trie.append directly:

You're basing these optimizations on the unoptimized version.
In the new optimized version ofAll is ~90x faster (I'm working on that PR in parallel), so this method would become a lot slower if I changed it to append directly.
I suggest doing the optimizations later, when we can measure something concrete :).

Edit:

It can't know that Vector is immutable and the value never changes...

Actually, when optimizing, the JMS has a few assumptions (e.g. only these 2 switch statements seem to be used, so it will throw out the rest), as long as this assumption can be corrected at a small cost, when violated. Therefore it can inline length (but even if it doesn't, storing it wouldn't likely provide any measurable gain, but would complicate the code slightly).

It's different though for iterator, when an internal class refers to the outer one, there extracting the length makes sense (but again, it has to be measured on a case-by-case basis, unfortunately, I've had too many surprises)

paplorinc · 2016-08-22T21:26:52Z

@danieldietrich, @ruslansennov, @zsolt-donca, @eduardmanas, did I answer or apply all your requests?
Can we merge it :)?

ruslansennov · 2016-08-22T21:31:51Z

javaslang/src/main/java/javaslang/collection/Vector.java

 *
 * @param <T> Component type of the Vector.
- * @author Ruslan Sennov
+ * @author Ruslan Sennov, Pap Lőrinc
+ * @see BitMappedTrie


Maybe last my comment :) BMT has package visibility, I believe this line should be removed (?)

I think this can be useful when someone's viewing the sources (I commented BMT thoroughly).
@danieldietrich?

@paplorinc Please remove the line. Instead you could modify the first line and mention that our Vector is based on a Bit Mapped Trie implementation.

Done, let's merge it, as I have at least 2 more PRs to prepare :)

zsolt-donca · 2016-08-23T14:05:55Z

@paplorinc Mine are all addressed!

danieldietrich · 2016-08-24T10:39:01Z

javaslang/src/main/java/javaslang/collection/BitMappedTrie.java

+ * @author Pap Lőrinc
+ * @since 2.1.0
+ */
+final class BitMappedTrie<T> implements Serializable {


Hi @paplorinc, great work so far. This implementation is smaller than I expected!

Before pulling it I want to see just one more thing: Please make BitMappedTrie Iterable and implement the iterator() method in the way that it does not use get(index). Instead it should use the internal BMT structure to efficiently iterate the elements (i.e. not getting the array on every step, instead use the offset until the end of the array is reached, right?).

This is inevitable because otherwise either the internal structure would leak to the outside or we could not leverage the knowledge about the internal structure and have to fall back to less efficient ways to access elements. The latter case is currently implemented (using get(index) each step).

I think that it will be a better base for comparing the upcoming optimizations.

It shouldn't be much effort, is it?

Sure, will do, but will have to delete it later when the leading and trailing are introduced to optimize append and prepend (in order to be able to iterate over all arrays, cannot combine 3 iterators efficiently otherwise)

Force pushed a new version.
@danieldietrich, I thought you were interested in the simplest solution in this commit.
I consider this explicit iterator an optimization.

We will see. In the design of types it is the right place for iterator(). We will take a look at leading and trailing next.

In Scala Vector is in between Array and List. List is best at prepend() & tail(), Array is best at random access get() and update(). This is the goal for Vector optimizations, it should be good at both and be in between Array and List. That's what I'm looking at, not the optimization of specific methods like groupBy() and slice() for example.

I'm really looking forward to review leading() and trailing(), especially on the performance changes of get(), update(), append(), prepend() and tail().

danieldietrich · 2016-08-24T13:31:26Z

javaslang/src/main/java/javaslang/collection/BitMappedTrie.java

+
+    Iterator<T> iterator() {
+        return new Iterator<T>() {
+            int globalIndex = 0;


Please create final length = BitMappedTrie.this.length();

paplorinc · 2016-08-24T20:24:03Z

Pushed, could we please merge this and move on the the optimizations?

danieldietrich · 2016-08-24T22:58:43Z

ok, thank you, pulling it in now

paplorinc · 2016-08-25T15:44:07Z

ok, thank you, pulling it in now

Please review #1522 also.

danieldietrich reviewed Aug 18, 2016
View reviewed changes

paplorinc reviewed Aug 18, 2016
View reviewed changes

danieldietrich added the feature label Aug 18, 2016

danieldietrich added this to the 2.1.0 milestone Aug 18, 2016

paplorinc reviewed Aug 19, 2016
View reviewed changes

danieldietrich reviewed Aug 19, 2016
View reviewed changes

paplorinc reviewed Aug 20, 2016
View reviewed changes

ruslansennov reviewed Aug 22, 2016
View reviewed changes

This was referenced Aug 23, 2016

Vector optimizations #1522

Merged

Implemented Vector as bit-mapped trie, with primitive support #1449

Closed

danieldietrich reviewed Aug 24, 2016
View reviewed changes

Pap Lőrinc added 2 commits August 24, 2016 14:41

Refactored code to allow the creation of an Arrays helper class

d4f229a

Benchmark suite changes

721437a

danieldietrich reviewed Aug 24, 2016
View reviewed changes

Changed Vector from HAMT to BMT

9205299

danieldietrich merged commit 9311fee into vavr-io:master Aug 24, 2016

paplorinc deleted the NewVector branch August 25, 2016 07:18

danieldietrich added the «vavr-collection» label Nov 13, 2016

Changed Vector from HAMT to BMT #1504

Changed Vector from HAMT to BMT #1504

Conversation

paplorinc commented Aug 17, 2016 • edited

codecov-io commented Aug 17, 2016 • edited

Current coverage is 96.98% (diff: 90.65%)

Choose a reason for hiding this comment

paplorinc commented Aug 18, 2016

Choose a reason for hiding this comment

danieldietrich Aug 18, 2016 • edited

Choose a reason for hiding this comment

paplorinc Aug 18, 2016 • edited

Choose a reason for hiding this comment

danieldietrich Aug 18, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieldietrich commented Aug 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruslansennov commented Aug 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieldietrich commented Aug 19, 2016

Choose a reason for hiding this comment

danieldietrich commented Aug 21, 2016 • edited

paplorinc commented Aug 22, 2016 • edited

paplorinc commented Aug 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsolt-donca commented Aug 23, 2016

danieldietrich Aug 24, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paplorinc commented Aug 24, 2016

danieldietrich commented Aug 24, 2016

paplorinc commented Aug 25, 2016

Changed `Vector` from `HAMT` to `BMT` #1504

Changed `Vector` from `HAMT` to `BMT` #1504

paplorinc commented Aug 17, 2016 •

edited

codecov-io commented Aug 17, 2016 •

edited

danieldietrich Aug 18, 2016 •

edited

paplorinc Aug 18, 2016 •

edited

danieldietrich Aug 18, 2016 •

edited

danieldietrich commented Aug 21, 2016 •

edited

paplorinc commented Aug 22, 2016 •

edited

danieldietrich Aug 24, 2016 •

edited