Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast relation traversals for rule materialisation #6382

Merged

Conversation

flyingsilverfin
Copy link
Member

@flyingsilverfin flyingsilverfin commented Jun 25, 2021

What is the goal of this PR?

We dramatically improve the performance of a rule materialisation by implementing a new traversal type and traversal algorithm for relation lookups (when all role players and roles are known)

What are the changes implemented in this PR?

  • Create a new type of iterator: FunctionalIterator.Sorted<T> which always returns an element greater (by it's well-defined comparison) than the prior one
    • create the corresponding Distinct/Filtered/Mapped/FlatMerged iterators which transform sorted iterators into other sorted iterators
  • Create a new subtype of FunctionalIterator.Sorted called Forwardable which implement an efficient forward(T target) method, which supports forward seeks
  • ByteArray is now sortable based on lexicographical ordering, which is what RocksDB keys use too
  • Create ThingAdjacency.DirectedEdge which is a sortable wrapper around undirected ThingEdge. We then create a new set of APIs for retrieving sorted/forwardable edge iterators from ThingAdjacency
  • Create a new kind of Edge: ThingEdge.Target, which may or may not exist between two ThingVertex with a given encoding. This is used for seeking a sorted iterator of ThingEdge
  • implement the new fast relation traversal algorithm in //traversal/iterator/RelationIterator, used in combination with a RelationTraversal, which only allows connecting a single relation over multiple role-player edges (eg. star-shaped)

@@ -220,6 +222,87 @@ public final int hashCode() {
}
}

public static class Virtual extends ThingEdgeImpl implements ThingEdge {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we create a virtual edge that may not exist when we want to use a seek operation on an edge iterator! Pretty interesting concept!

import static com.vaticle.typedb.core.common.exception.ErrorMessage.Internal.ILLEGAL_ARGUMENT;
import static com.vaticle.typedb.core.common.exception.ErrorMessage.Internal.ILLEGAL_STATE;

public class FlatMergeSortedIterator<T, U extends Comparable<? super U>> extends AbstractFunctionalIterator.Sorted<U> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this iterator merges N sorted iterators into another sorted iterator using a Min-Heap data structure

@@ -203,7 +206,7 @@ public ThingVertex convertToReadable(VertexIID.Thing iid) {
VertexIID.Thing iid = generate(keyGenerator, typeVertex.iid(), typeVertex.properLabel());
ThingVertex.Write vertex = new ThingVertexImpl.Write.Buffered(this, iid, isInferred);
thingsByIID.put(iid, vertex);
thingsByTypeIID.computeIfAbsent(typeVertex.iid(), t -> new ConcurrentSet<>()).add(vertex);
thingsByTypeIID.computeIfAbsent(typeVertex.iid(), t -> new ConcurrentSkipListSet<>()).add(vertex);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support using sorted iterators, we must use natively navigable data structures - in this case we use a ConcurrentSkipList to read a sorted iterator from

@@ -406,8 +406,8 @@ public References references() {

public FunctionalIterator<RuleStructure> all() {
Encoding.Prefix index = IndexIID.Rule.prefix();
FunctionalIterator<RuleStructure> persistedRules = storage.iterate(index.bytes(), (key, value) ->
convert(StructureIID.Rule.of(value)));
FunctionalIterator<RuleStructure> persistedRules = storage.iterate(index.bytes(), Storage.SortedPair::new)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want necessarily make StructureIID implements ByteComparable, so have a sortable pair that we can then use a normal map operation over it, discarding the sortedness property we don't care about in this case

Comment on lines 60 to 73
default ThingIteratorSortedBuilder edge(Encoding.Edge.Thing encoding, IID... lookAhead) {
if (encoding == Encoding.Edge.Thing.HAS) return edgeHas(lookAhead);
else if (encoding == Encoding.Edge.Thing.PLAYING) return edgeHas(lookAhead);
else if (encoding == Encoding.Edge.Thing.RELATING) return edgeHas(lookAhead);
else if (encoding == Encoding.Edge.Thing.ROLEPLAYER) {
if (lookAhead.length > 0) return edgeRolePlayer(lookAhead[0], Arrays.copyOfRange(lookAhead, 1, lookAhead.length));
else throw TypeDBException.of(ILLEGAL_OPERATION);
}
else throw TypeDBException.of(ILLEGAL_STATE);
}
ThingIteratorSortedBuilder edgeHas(IID... lookAhead);
ThingIteratorSortedBuilder edgePlaying(IID... lookAhead);
ThingIteratorSortedBuilder edgeRelating(IID... lookAhead);
ThingIteratorSortedBuilder edgeRolePlayer(IID roleType, IID... lookAhead);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only under certain circumstances can we provide a sorted edge iterator - specifically any has/relating/playing edge iterator, or a roleplayer edge when also given a roleType lookahead. We enforce this via the function signature

Comment on lines 182 to 151
abstract class EdgeDirected implements Comparable<EdgeDirected> {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new construct: we wrap an Edge which is undirected but therefore not sortable -- if it is an In adjacency, it may be sortable but in an Out. In other words, we can only sort in the direction of the adjacency. This wrapper enables us to return sorted edge iterators

Comment on lines +101 to +105
@Override
public int compareTo(ThingVertex o) {
return iid.bytes().compareTo(o.iid().bytes());
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thing vertices, which we want to sort, and directed edges are all sortable and therefore require compareTo

Comment on lines 89 to 93
public void seek(T target) {
if (state == State.INIT) initialise(target.getBytes());
else internalRocksIterator.seek(target.getBytes().getArray());
state = State.SEEKED_EMPTY;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we seek but don't want to immediately check if there's a valid next, we introduce a new state SEEKED_EMPTY which will just check the current key without calling next() (which progresses the rocksDB iterator)

import static com.vaticle.typedb.core.common.iterator.Iterators.iterate;
import static java.util.stream.Collectors.toMap;

public class RelationIterator extends AbstractFunctionalIterator<VertexMap> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the new Fast relation lookup algorithm. Did my best to make it as simple as possible, but still good to go through I think!

@flyingsilverfin flyingsilverfin changed the title Fast relation lookup Implement fast Relation traversal to improve materialisation performance Jul 7, 2021
@flyingsilverfin flyingsilverfin changed the title Implement fast Relation traversal to improve materialisation performance Fast relation traversals for rule materialisation Jul 7, 2021
@flyingsilverfin flyingsilverfin marked this pull request as ready for review July 7, 2021 17:29
@flyingsilverfin flyingsilverfin added this to the 2.2.1 milestone Jul 7, 2021
@@ -30,6 +30,7 @@ java_library(

# External Maven Dependencies
"@maven//:com_github_ben_manes_caffeine_caffeine",
"@maven//:com_google_guava_guava",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use guava for lexicographical byte array comparison -- it contains an optimised implementation that is OS dependent

Comment on lines 60 to 62
public ByteArray getBytes() {
return this;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteComparable classes must implement getBytes() which pushed the old getBytes() to become getArray()

Comment on lines 23 to 32
public interface ByteComparable<T extends ByteComparable<T>> extends Comparable<T> {

ByteArray getBytes();

@Override
default int compareTo(T o) {
return getBytes().compareTo(o.getBytes());
}

}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this interface ensures that anything that is byte comparable is compared the exact same way, using our lexicographical ordering (which matches RocksDB's comparator exactly)

common/iterator/AbstractFunctionalIterator.java Outdated Show resolved Hide resolved
private T next;
private T last;

public Sorted(NavigableSet<T> source) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "basis" of a Sorted functional iterator is a NavigableSet, which can critically be used to implement seek()

@@ -113,7 +113,7 @@ public static Type of(VertexIID.Type start, Encoding.Infix infix, VertexIID.Type
}
}

public static class Thing extends EdgeIID<Encoding.Edge.Thing, InfixIID.Thing, VertexIID.Thing, VertexIID.Thing> {
public static class Thing extends EdgeIID<Encoding.Edge.Thing, InfixIID.Thing, VertexIID.Thing, VertexIID.Thing> implements Comparable<Thing> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything we want to use in sortable iterators must be comparable

@@ -570,7 +572,7 @@ public Relation asRelation() {
playersWithIIDs.add(playerId);
}
});
return traversalEng.iterator(traversal).map(conceptMgr::conceptMap)
return traversalEng.relations(traversal).map(conceptMgr::conceptMap)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our new type of traversal for fast relation lookups is on traversalEng.relations()!

Comment on lines 79 to 80
case SEEKED_EMPTY:
return checkValidNext();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after we have done a forward(), we want to lazily check whether there is a valid next element without actually calling next() as this moves the rocks iterator forward again. To introduce this we create a new state SEEKED_EMPTY which does half the work of EMPTY, just checking iterator validity without progressing the rocks iterator to the next element

@@ -185,7 +186,7 @@ public ByteArray get(ByteArray key) {
}

@Override
public <G> FunctionalIterator<G> iterate(ByteArray key, BiFunction<ByteArray, ByteArray, G> constructor) {
public <G extends Bytes.ByteComparable<G>> FunctionalIterator.Sorted<G> iterate(ByteArray key, BiFunction<ByteArray, ByteArray, G> constructor) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now anything we iterate from storage must be ByteComparable. We can mapSorted it later to convert it to other types as needed, all of which must be comparable as well

Comment on lines 69 to 70
public FunctionalIterator<VertexMap> relations(Traversal traversal) {
return traversal.relations(graphMgr);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new fast relation lookup!

Comment on lines 225 to 226
FunctionalIterator.Sorted<ThingVertex> storageIterator = storage.iterate(prefix)
.mapSorted(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be one line

Comment on lines 126 to 139
public final FunctionalIterator.Sorted.Forwardable<T> merge(FunctionalIterator.Sorted.Forwardable<T>... iterators) {
return Iterators.Sorted.merge(this, iterators);
}

@Override
public <U extends Comparable<? super U>> FunctionalIterator.Sorted.Forwardable<U> mapSorted(Function<T, U> mappingFn, Function<U, T> reverseMappingFn) {
return Iterators.Sorted.mapSorted(this, mappingFn, reverseMappingFn);
}

@Override
public FunctionalIterator.Sorted.Forwardable<T> distinct() {
return Iterators.Sorted.distinct(this);
}

@Override
public FunctionalIterator.Sorted.Forwardable<T> filter(Predicate<T> predicate) {
return Iterators.Sorted.filter(this, predicate);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to re-implement these in each Forwardable class because we can't do multiple inheritance

expose edge iterator as a sorted edge iterator

fix sorted iterator bugs

fix merge iterator

split ThingAdjacency into sorted and unsorted builders

fix arraycopy

wip

Implement discussed comparators and sorted iterator interface

implement BaseIterator.Sorted from NavigableSet

rebase master and back to buildable

fix sortability of vertex based on edge

fix remove() call

fix rule retrieval sortedness

implement edges sorted correctly

ByteArrays use unsigned byte-by-byte ordering

byte array comparison use unsigned byte comparisons

loading rules obeys sortedness of storage iteration

expose EdgeSortable outwardly

EdgeSortable implement hashcode and equals

clean up TODOs and implement seek() in FlatMerge

fix license

rebase successful

cleanup ByteComparable to be only in minmal places

fix sortable type edges

fix build

make distinct() in ThingGraph.getReadable() efficient

fix null ptr

cleanup

cleanup

revert NavigableSet in ThingGraph

cleanup

Revert "revert NavigableSet in ThingGraph"

This reverts commit f9f48d4.

parent a35c269
author joshua <joshuasend1@gmail.com> 1625158655 +0100
committer joshua <joshuasend1@gmail.com> 1625675337 +0100

implement fast relation lookup algorithm

add missing impls and fix bugs

refactor RelationIterator, add assertions

use relation type information in lookahead

fix filtered iterator fetch

cleaning up

refactor RocksIterator changes

refactor RelationIterator

All sorted iterators validate no backward seeks are allowed

cleanup
RocksIterator always returns KeyValue

fix build error

refactor ByteArray method names

cleanup comments

rename FlatMerge to MergeMap

implement reverse mapping assertion and rename to past tense iterator

remove newline and add parameter

make constructors non-public

Add KeyValue equality and hide public constructors of iterators

inline inner classes

rename edge builders

fix adjacency build

wip stash decisions
Replace previous usages of Sorted with Sorted.Forwardable

Create optimised and non-optimised Thing edge encodings

Revert split api of edgeHas/edgePlaying/edgeRelating/edgeRolePlayer

cleanup

cleanup

reorganise RelationIterator
cleanup encoding enum split

rename Edge.Thing.Data to Edge.Thing.Base

rename ThingEdgeImpl.Virtual to Target

remove merge() varargs in Sorted iterator

simplify ThingAdjacencyImpl.SortedIteratorBuilder

reorder and rename RocksIterator apis

cleanup imports

cleanup

cleanup

simplify matchRelation

add hasNext catch
RelationIterator can handle multiple roles and relation types

tighten up RelationTraversal API to only allow star shaped relations

cleanup redudant variable

move some APIs to RelationTraversal

Cleaned up implementation of RelationIterator

address comment
@flyingsilverfin flyingsilverfin merged commit 1f6add9 into vaticle:master Jul 23, 2021
@flyingsilverfin flyingsilverfin deleted the fast-relation-lookup-2 branch July 23, 2021 11:47
haikalpribadi pushed a commit that referenced this pull request Jul 28, 2021
## What is the goal of this PR?

Using a RocksIterator after it is closed can cause a `SIGSEGV`, which crashes the server. #6382 introduced a new API on RocksIterator that was not protected against `close()` race conditions, which is now corrected.

## What are the changes implemented in this PR?

* since `forward()` is synchronized, we only add a check to see if the iterator is completed before performing a seek() operation on the rocks iterator. This prevents utilising a closed rocksDB resouce, which would cause the SIGSEGV
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants