New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed Vector
from HAMT
to BMT
#1504
Conversation
Current coverage is 96.98% (diff: 90.65%)@@ master #1504 diff @@
==========================================
Files 87 88 +1
Lines 10541 10638 +97
Methods 0 0
Messages 0 0
Branches 1798 1814 +16
==========================================
+ Hits 10228 10317 +89
+ Misses 183 179 -4
- Partials 130 142 +12
|
* @author Ruslan Sennov | ||
* @since 2.0.0 | ||
* @author Pap Lőrinc | ||
* @since 3.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it will make it to 2.1.0, the interface did not change (only methods were added)
Force pushed a new version. We don't need to go down the tree and back again (saves time), but the lambdas slow it down a bit, so the end performance is basically the same as before (updated it in the description, though). Now the non-optimized @danieldietrich, @ruslansennov, @zsolt-donca, waiting eagerly for your reviews :) |
return modifyLeaf(array, depthShift, index, node, leaf); | ||
} | ||
|
||
private static Object[] modifyLeaf(Object[] array, int depthShift, int index, NodeModifier nodeModifier, NodeModifier leafModifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prepend
, append
, update
, drop
and take
now use this iterative method instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can save lambda instances by creating singletons. I'm not sure if the curried NODE version helps...
// names can be changed
private static final NodeModifier NODE = (o, i) -> copy(o, i + 1);
private static final Function<Object, NodeModifier> LEAF = element -> (o, i) -> copyUpdate(o, i, element);
// ...
private static Object[] modifyLeafValue(Object[] array, int depthShift, int index, Object element) {
return modifyLeaf(array, depthShift, index, NODE, LEAF.apply(element));
}
Update: If the above makes sense, we don't need modifyLeafValue()
at all and can use modifyLeaf()
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a measurable difference this way 💃
Thanks @danieldietrich!
before:
Operation Ratio 32 1024 32768
Update slang_persistent/scala_persistent 1.03× 0.93× 1.17×
Prepend slang_persistent/scala_persistent 0.64× 0.58× 0.44×
Append slang_persistent/scala_persistent 0.84× 0.33× 0.26×
Slice slang_persistent/scala_persistent 0.74× 1.19× 1.47×
after:
Operation Ratio 32 1024 32768
Update slang_persistent/scala_persistent 1.04× 1.14× 1.24×
Prepend slang_persistent/scala_persistent 0.64× 0.58× 0.43×
Append slang_persistent/scala_persistent 0.94× 0.37× 0.29×
Slice slang_persistent/scala_persistent 0.79× 1.23× 1.64×
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmhh, interesting - do I interpret it right that creating more instances performs better?
How frequently modifyLeafValue()
is called on an operation, round about? Zero or Once / logarithmic times?
I guess that before/after performs equal (modulo measurement error). Because we turned GC off, we don't see the effect of creating instances. However, if the method is not called often, it will not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called once per operation, e.g. one update
will call modifyLeaf
once.
But don't worry about GC
, through escape analysis short-lived lambdas might end up on the stack, instead of the heap (https://dzone.com/articles/java-lambdas-and-low-latency).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
|
||
/** clone the source and set the value at the given position */ | ||
static Object[] copyUpdate(Object arrayObject, int index, Object element) { | ||
final Object[] array = (Object[]) arrayObject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as stated in the above JavaDoc, untyped Object
array, to get rid of call-site casts (and make room for non-Object based arrays also)
Build fails with [ERROR] /home/travis/build/javaslang/javaslang/javaslang-benchmark/src/test/java/javaslang/collection/CharSeqBenchmark.java:[54,98] method equals in class java.lang.Object cannot be applied to given types;
required: java.lang.Object
found: char[],char[]
reason: actual and formal argument lists differ in length |
return branchCount * fullBranchSize; | ||
} | ||
|
||
BitMappedTrie<T> prependLeaf(T leading) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be optimized by appending whole arrays instead...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method called prependLeaf
as opposed to just prepend
? Does it make sense to have, or can you imagine, an operation in which we prepend anyplace else but to the first leaf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until now you could only append full leafs (proposed here as optimization) and didn't see the point in renaming it back and forth.
Renamed both :)
Sorry guys, I can't take part in discussion, too busy until Monday. |
assertAreEqual(actual, null, constant(actual), expected); | ||
} | ||
|
||
System.out.println("Depth " + depth + " ok!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should remove all System.out calls (and assert if necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it was mainly for debugging (I've seen that many other Javaslang tests print out content, but I agree that they shouldn't) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will clean up the others also (in another PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I will keep System.out.println("using seed " + seed);
as it's the only way to reproduce it in case of failure)
For me it looks good so far. I can pull it in if the System.outs are removed... |
@@ -578,6 +581,7 @@ public void slang_persistent(Blackhole bh) { | |||
for (int i = 1; !values.isEmpty(); i++) { | |||
values = values.slice(1, values.size()); | |||
values = values.slice(0, values.size() - 1); | |||
assert sizeOf(values.trie) < (branchingFactor() * sizeOf(copyRange(ELEMENTS, i, javaMutable.size() - i))); /* detect memory leak */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added leak detection, i.e. if slice
only keeps a few elements from a lot, its size should be small
We can optimize the partition method, which currently first fills an ArrayList, then calls @Override
public Tuple2<Vector<T>, Vector<T>> partition(Predicate<? super T> predicate) {
Objects.requireNonNull(predicate, "predicate is null");
final ArrayList<T> left = new ArrayList<>(), right = new ArrayList<>();
for (int i = 0; i < length(); i++) {
final T t = get(i);
(predicate.test(t) ? left : right).add(t);
}
return Tuple.of(ofAll(left), ofAll(right));
} I think we could save CPU + Mem by calling @Override
public Tuple2<Vector<T>, Vector<T>> partition(Predicate<? super T> predicate) {
Objects.requireNonNull(predicate, "predicate is null");
final BitMappedTrie<T> left = BitMappedTrie.empty(), right = BitMappedTrie.empty();
for (T t : this) {
if (predicate.test(t)) {
left = left.append(t);
} else {
right = right.append(t);
}
}
return Tuple.of(ofAll(left), ofAll(right));
} We can also iterate elements instead of calling get for each element. Question: Does the JVM optimizes
Wouldn't it be better to first store
|
You're basing these optimizations on the unoptimized version. Edit:
Actually, when optimizing, the JMS has a few assumptions (e.g. It's different though for iterator, when an internal class refers to the outer one, there extracting the |
@danieldietrich, @ruslansennov, @zsolt-donca, @eduardmanas, did I answer or apply all your requests? |
* | ||
* @param <T> Component type of the Vector. | ||
* @author Ruslan Sennov | ||
* @author Ruslan Sennov, Pap Lőrinc | ||
* @see BitMappedTrie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe last my comment :) BMT has package visibility, I believe this line should be removed (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be useful when someone's viewing the sources (I commented BMT
thoroughly).
@danieldietrich?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paplorinc Please remove the line. Instead you could modify the first line and mention that our Vector is based on a Bit Mapped Trie implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, let's merge it, as I have at least 2 more PRs to prepare :)
@paplorinc Mine are all addressed! |
* @author Pap Lőrinc | ||
* @since 2.1.0 | ||
*/ | ||
final class BitMappedTrie<T> implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @paplorinc, great work so far. This implementation is smaller than I expected!
Before pulling it I want to see just one more thing: Please make BitMappedTrie
Iterable
and implement the iterator()
method in the way that it does not use get(index)
. Instead it should use the internal BMT structure to efficiently iterate the elements (i.e. not getting the array on every step, instead use the offset until the end of the array is reached, right?).
This is inevitable because otherwise either the internal structure would leak to the outside or we could not leverage the knowledge about the internal structure and have to fall back to less efficient ways to access elements. The latter case is currently implemented (using get(index)
each step).
I think that it will be a better base for comparing the upcoming optimizations.
It shouldn't be much effort, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do, but will have to delete it later when the leading
and trailing
are introduced to optimize append
and prepend
(in order to be able to iterate over all arrays, cannot combine 3 iterators efficiently otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force pushed a new version.
@danieldietrich, I thought you were interested in the simplest solution in this commit.
I consider this explicit iterator
an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will see. In the design of types it is the right place for iterator(). We will take a look at leading and trailing next.
In Scala Vector is in between Array and List. List is best at prepend() & tail(), Array is best at random access get() and update(). This is the goal for Vector optimizations, it should be good at both and be in between Array and List. That's what I'm looking at, not the optimization of specific methods like groupBy() and slice() for example.
I'm really looking forward to review leading() and trailing(), especially on the performance changes of get(), update(), append(), prepend() and tail().
|
||
Iterator<T> iterator() { | ||
return new Iterator<T>() { | ||
int globalIndex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create final length = BitMappedTrie.this.length();
Pushed, could we please merge this and move on the the optimizations? |
ok, thank you, pulling it in now |
Please review #1522 also. |
Fixes: Implement Vector as bit-mapped vector trie #568.
Dependency for: #1449, #1522 and #1557.
This is the non-optimized version that I only compared to
Scala
for now, feel free to compare it against:Vector
Automatic, internal primitive storage support forVector
#1557.A
bit-mapped trie
is a very wide and shallow tree (for integer indices the depth will be≤6
).Each node has a maximum of
32
children (configurable).Access to a given position is done by converting the index to a base 32 number and using each digit to descend down the tree.
Modifying the tree is done similarly, but along the way the path is copied, returning a new root every time.
Append
inserts in the last leaf, or if the tree is full from the right, it adds another layer on top of it (the old root will be the first of the new one).Prepend
is done similarly, but an offset is needed, because adding a new top node (where the current root would be the last node of the new root) shifts the indices by half of the current tree's full size. Theoffset
shifts them back to the correct index.Slice
is done by trimming the path from the root and discarding anyleading
/trailing
values in effectively constant time (without memory leak, as inJava
/Clojure
).Memory:
Speed:
i.e. for small number of elements it often has the same speed, for larger ones it's often
2-3x
slower.Compared to the optimized version this is roughly:
~30x
slower~2x
slower~4x
slower~0.7x
slower (i.e.~1.4x
faster)~0.95x
slower (i.e.~1.05x
faster)~2.5x
slower~2.5x
slower~3.5x
slower~3.5x
slower~1.1x
slower