Authors: Tobias Lindaaker <tobias.lindaaker@neotechnology.com>
Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths. Queries of this type are typically referred to as Regular Path Queries (RPQs). In Cypher Regular Path Queries can be expressed through the use of Path Patterns. The Path Patterns supported by Cypher are capable of supporting more than just regular expressions though.
A Path Pattern is defined as:
-
A simple relationship type
()-/:X/-()
denotes a Path Pattern matching relationships of typeX
. -
A predicate on the labels of a node
()-/(:Z)/-()
denotes a Path Pattern matching nodes with labelZ
. -
A predicate on the properties of a node
()-/({foo: bar})/-()
denotes a Path Pattern matching nodes where thefoo
property has the value of bar. -
A sequence of Path Patterns
()-/πΆ π½/-()
denotes a Path Pattern matching first the pattern defined byπΆ
, then the pattern defined byπ½
(in order left to right). -
An alternative between Path Patterns
()-/πΆ | π½/-()
denotes a Path Pattern matching either the pattern defined byπΆ
or the pattern defined byπ½
. -
A repetition of a Path Pattern
()-/πΆ*/-()
denotes a Path Pattern matching the pattern defined byπΆ
zero or more times.
()-/πΆ+/-()
denotes a Path Pattern matching the pattern defined byπΆ
one or more times.
()-/πΆ*x../-()
denotes a Path Pattern matching the pattern defined byπΆ
x
or more times.
()-/πΆ*x..y/-()
denotes a Path Pattern matching the pattern defined byπΆ
at leastx
times and at mosty
times. -
A grouping of a Path Pattern
()-/[πΆ]/-()
denotes a grouping of the patternπΆ
. -
A predicate on the properties of the edges in a path
()-/[πΆ {foo: bar}]/-()
denotes a Path Pattern where all the edges in the group that matchπΆ
must have afoo
property with the value of bar. -
A specification of direction for a Path Pattern
()-/ πΆ >/-()
denotes that the Path PatternπΆ
should be interpreted in a left-to-right direction.
()-/< πΆ /-()
denotes that the Path PatternπΆ
should be interpreted in a right-to-left direction.
()-/< πΆ >/-()
denotes that the Path PatternπΆ
should be interpreted in any direction. -
A reference to a Named Path Predicate
()-/~alpha/-()
denotes a reference to a Named Path Predicatealpha
.
Binding of a relationship to a variable is only allowed in the most simple case of a Path Pattern, where only a single relationship is matched by the pattern.
For binding a whole path to a variable, Path Assignment should be used, by preceding the path with an identifier and an equals sign (=
).
This avoids a problem that existed in the past with repetition of relationships (a syntax that is unsupported as of the introduction of Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships.
Predicates on parts of a Path Pattern are instead expressed through the use of explicitly Named Path Predicates.
Path Patterns are part of the Pattern syntax of Cypher.
Pattern = PatternPart, {',', PatternPart} ;
PatternPart = [Variable, '='], NodePattern, {(EdgePattern | PathPattern), NodePattern} ;
NodePattern = '(', [Variable], [NodeLabels], [Properties], ')' ;
EdgePattern = (LeftArrowHead, Dash, [EdgeBody], Dash, RightArrowHead)
| (LeftArrowHead, Dash, [EdgeBody], Dash)
| (Dash, [EdgeBody], Dash, RightArrowHead)
| (Dash, [EdgeBody], Dash)
;
EdgeBody = '[', [Variable], [EdgeLabels], [Properties], ']' ;
PathPattern = (LeftArrowHead, Dash, '/', PathExpression, '/', Dash, RightArrowHead)
| (LeftArrowHead, Dash, '/', PathExpression, '/', Dash)
| (Dash, '/', PathExpression, '/', Dash, RightArrowHead)
| (Dash, '/', PathExpression, '/', Dash)
;
PathExpression = {PathAlternative} ;
PathAlternative = PathRepetition, {'|', PathRepetition} ;
PathRepetition = PathDirection, [('*', [RangeDetail]) | '+' | '?'] ;
PathDirection = ['<'], PathBase, ['>'] ;
PathBase = PathEdge
| PathAny
| PathNode
| PathReference
| PathGroup
;
PathEdge = ':', LabelName ;
PathAny = '-' ;
PathNode = '(', [NodeLabels], [Properties], ')' ;
PathReference = '~', SymbolicName ;
PathGroup = '[', PathExpression, [Properties], ']' ;
EdgeLabel = ':', LabelName ;
EdgeLabels = ':', LabelName, {'|', [':'], LabelName} ;
NodeLabels = ':', LabelName, ({':', LabelName} | {'|', LabelName}) ;
LabelName = SymbolicName ;
RangeDetail = [IntegerLiteral | Parameter], '..', [IntegerLiteral | Parameter] ;
The PathReference
is a reference to a Named Path Predicate.
These are defined using the following syntax:
NamedPathPredicate = 'PATH', 'PATTERN', NamedPathName, '=', PathPattern, [Where] ;
NamedPathName = SymbolicName ;
The direction of relationships matched by a Path Pattern is primarily decided by the directional arrow surrounding the pattern.
If the arrow points from left to right (i.e. (left)-/~pattern/->(right)
), the paths described by the pattern are paths in the left-to-right direction, i.e. paths that are outgoing from the node to the left of the pattern, and incoming to the node to the right of the pattern.
If the arrow points from right to left (i.e. (left)<-/~pattern/-(right)
), the paths described by the pattern are paths in the right-to-left paths direction, i.e. paths that are incoming to the node to the left of the pattern, and outgoing from the node to the right of the pattern.
If there are no arrowheads (i.e. (left)-/~pattern/-(right)
), or if both arrowheads are present (i.e. (left)<-/~pattern/->(right)
), the paths described by the pattern are paths in either the left-to-right or the right-to-left direction.
All parts of a Path Pattern will assume the direction of the surrounding arrow, unless the direction is explicitly overridden for that particular part of the pattern.
A prefix of <
to part of a pattern overrides the direction of that part to be right-to-left.
A suffix of >
to part of a pattern overrides the direction of that part to be left-to-right.
Both a <
prefix and a >
suffix can be used on the same part of the pattern to override the direction of that part to be either direction.
Direction overrides only apply to a single pattern part.
In order to apply the direction override to multiple parts of the pattern, those parts should be grouped.
Using both a <
prefix and a >
suffix on the same pattern is always the same thing as a disjunction between that pattern with a <
prefix and that pattern with a >
suffix.
This means that ()-/< πΆ >/-()
is the same as ()-/[< πΆ] | [πΆ >]/-()
.
When a Named Path Predicate is referenced the direction of reference is matched with the direction in the declaration of the Named Path Predicate. If the declaration of the Named Path Predicate is defined left-to-right, but the direction of the reference is right-to-left, the direction of definition of the the Named Path Predicate is reversed to match that of the reference. The same reversal applies if the Named Path Predicate is defined right-to-left but the direction of the reference is left-to-right. If the direction of the reference is either direction, the Named Path Predicate is matched both in its declared direction and its reversed direction. If a Named Path Predicate is declared without a direction, the direction of the reference does not matter, since the direction of the Named Path Predicate is inherently any direction. A Named Path Predicate declared without a direction must have a definition that is equivalent if reversed.
-
()-/a <[b c] d/->()
is the same as()-/a/->()<-/b c/-()-/d/->(d)
, i.e. the direction of the groupb c
has been overridden to be right-to-left in a pattern where the overall direction is left-to-right. -
()-/a <b> c/->()
is the same as()-/a/->()-/b/-()-/c/->()
, i.e. the direction ofb
has been overridden to be either direction. -
()-/a/-()
,()-/<a>/-()
,()-/<a>/->()
,()<-/<a>/-()
,()<-/<a>/->()
, and()<-/a/->()
all mean the same thing: matchinga
in either direction.
Given these Named Path Predicates:
PATH PATTERN alpha = ()-[:X]->()-[:Y]->()
PATH PATTERN beta = ()<-[:Y]-()<-[:X]-()
PATH PATTERN gamma = ()-/[:X :Y]> | <[:Y :X]/-()
-
()-/~alpha/->()
is equivalent to()<-/~beta/-()
-
()<-/~alpha/-()
is equivalent to()-/~beta/->()
-
()-/~gamma/->()
is equivalent to()<-/~gamma/-()
, since both are equivalent to()-/~gamma/-()
-
()-/~gamma/-()
is equivalent to()-/~alpha/-()
, since()-/~alpha/-()
is the same as()-/~alpha> | <~alpha/-()
, which is equivalent to the declaration ofgamma
.
It is also equivalent to()-/<~beta | ~beta>/-()
which is the same as()-/~beta/-()
.
The astute reader of the syntax will have noticed that it is possible to express a Path Pattern with an empty path expression:
MATCH (a)-//-(b)
The semantics of this query is to match any single relationship between a
and b
.
It is thus equivalent to (a)-/-/-(b)
or (a)--(b)
.
The same rule applies to groupings within a pattern, an empty group matches a single edge.
It is possible to express a completely empty pattern, a pattern that matches a
and b
to the same node.
This is done by using only a single node predicate in the path pattern:
MATCH (a)-/()/-(b)
This pattern states that a
and b
must be the same node, by virtue of stating a pattern that matches any node.
It is thus the same as:
MATCH (a), (b) WHERE a = b
The Path Patterns start becoming interesting when larger expressions are put together:
MATCH (you)-/[:KNOWS :HATES]+ :LOVES/->(someone)
Note the +
expressing one or more occurrences of the sequence KNOWS
followed by HATES
.
Using the arrowhead syntax introduced in Directions, consider the following query:
MATCH (you)-/[:KNOWS <:HATES]+ :LOVES/->(someone)
In the example above we say that the HATES
relationships should have the opposite direction to the other relationships in the path.
Through the use of Named Path Predicates we can express even more predicates over a path:
PATH PATTERN unreciprocated_love = (a)-[:LOVES]->(b)
WHERE NOT EXISTS { (b)-[:LOVES]->(a) }
MATCH (you)-/~unreciprocated_love*/->(someone)
Note how there is no colon used for referencing the Named Path Predicate, the colon is used in Path Patterns only for referencing actual relationship types.
Sometimes it will be interesting to express a predicate on a node in a Path Pattern. This can be achieved by using a Named Path Predicate where the nodes on both ends are the same:
PATH PATTERN not_a_hater = (x)
WHERE NOT EXISTS { (x)-[:HATES]->() }
MATCH (you)-/:KNOWS ~not_a_hater :KNOWS/-(friend_of_friendly_friend)
When the pattern of a Named Path Predicate is reflexive, the direction in which the predicate is used is irrelevant. A pattern is reflexive if it consists of a single node, if the first and last node of the pattern are the same, or if the pattern is symmetrical as in this example:
PATH PATTERN co_author = (a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b)
MATCH (you)-/~co_author*/-(someone)
(a,b)
where every second node in the path has label A
PATH PATTERN a_and_other = (:A)--()
MATCH (a)-/ [~a_and_other -]* | [- ~a_and_other]* /-(b)
RETURN a, b
(a,b)
where there are at least 2 instances of a node labelled X
linked to a node labelled Y
in the pathMATCH (a)-/-* (:X)-(:Y) -* (:X)-(:Y) -*/-(b)
RETURN a, b
Some things are not possible to express using the Path Pattern syntax, a few of these things are worth highlighting.
It is not possible to denote a pattern that matches a pair of nodes that does not have a path matching a certain Path Pattern between them. The reason why it is not possible to match such a pattern using the Path Pattern syntax is because a matching instance would not be a path. There would be a discontinuity of a pair of nodes in the result of that pattern that has no edges in between them, and such a result is not a path. Path Patterns always match paths, so therefore it is not possible to express such a pattern.
It is however possible to match a pair of nodes (a
and b
) that does not have a path matching a given Path Pattern (πΆ
) between them, it is just not possible to express that as a path:
MATCH (a), (b)
WHERE NOT EXISTS { (a)-/πΆ/-(b) }
Queries like this are generally not tractable, so arguably it is a good thing that they are not easy to express.
While it is possible to express that a certain property should have the same value for all nodes in a path (by saying that each pair of nodes should have the same property value), it is not possible to express that all nodes should have a different property value. It has been shown that computing such paths would not be tractable in the general case, so perhaps it is a good thing to not be able to express this.
There is also no convenient way to express that the value of a certain property should in all nodes of the path be different from the value of that property in the first node of the path. Having a different value from the property in the first node is a tractable simplification of the problem of differing property values that can be solved by Regular Expressions with Memory (REMs). Since Cypher uses lexical scoping of the variables in a path pattern, Cypher is closer to the Regular Queries with Binding variant of REMs, which has been shown not to be able to express such queries.
It is however possible to express this type of path by venturing outside of the Path Pattern syntax and use a predicate over the entire path:
foo
property from the first nodePATH PATTERN different_from_first = (first)-/~some_pattern/-()
WHERE all( n IN nodes(different_from_first)
WHERE n = first OR n.foo <> first.foo )
GXPath | Cypher | |
---|---|---|
The empty pattern, from a node to itself, via nothing |
|
|
Match an edge with any label |
|
|
Match edge with a given label |
|
|
Inverted direction of an edge |
|
|
Match πΆ 0 or more times |
|
|
Match πΆ followed by π½ |
|
|
Disjunction: Either match πΆ or match π½ |
|
|
Any pair of nodes not reachable via πΆ |
|
not supported |
Node matching a given Node Predicate |
|
|
Repeat pattern πΆ between |
|
|
Path through πΆ, where data value of origin node is equal to value at destination node |
|
|
Path through πΆ, where data value of origin node differs from value at destination node |
|
|
Conjunctions (not in GXPath, allows CRPQs) Note that in this case Cypher requires one of the patterns needs to be chosen as the main pattern, this is the pattern that will be seen when binding the matched path |
|
|
GXPath | Cypher | |
---|---|---|
Node has a path matching a path expression |
|
|
Negation of predicate |
|
|
Conjunction of predicates |
|
|
Disjunction of predicates |
|
|
Value equal to constant |
|
|
Value not equal to constant |
|
|
Value reachable from node by path πΆ equal to value reachable by path π½ |
|
|
Value reachable from node by path πΆ differs from value reachable by path π½ |
|
|
Regular Expressions with Memory does not have bounded scope for the memory of variables, since it is an algebra designed to model a register automata. An alternative that does have lexical scoping is called Regular Expressions with Binding, and is proven to be a subset of Regular Expressions with Memory, fully translatable to Regular Expressions with Memory. Regular Expressions with Binding is thus more in line with what an actual language would express, and possible to map to Cypher.
In the table below π£
is partial function from a variable x
to the memory domain π
.
Regular Expressions with Binding | Cypher | |
---|---|---|
Empty path |
|
|
Single edge |
|
|
Inversion of single edge |
|
|
Concatenation |
|
|
Disjunction |
|
|
Transitive closure |
|
|
Data value (and memory state π£) matching condition |
|
|
Assignment of variable |
|
|
Note that in the assignment case in Cypher, the scope of the variables is within a single Named Path Predicate. Internal references to other Named Path Predicates will not have those variables in scope. It is thus important for the expressive power that the other composition rules above do not need to be expressed through Named Path Predicates. Even so the expressive power in terms of variable scope is less in Cypher than it is in Regular Expressions with Binding, since there are no nested scopes in Cyphers Named Path Predicates.
The Named Path Predicates of the Cypher Path Patterns allow the definition of what amounts to a context free language over paths in the graph. Here we will show that type of compositions possible in a context free grammar, have corresponding constructs in Cypher.
In the Context Free Grammar column below, upper case latin characters are used to denote non-terminal symbols, lower case latin characters denote terminal symbols, and greek characters are used to denote strings of non-terminal or terminal symbols.
Context Free Grammar | Cypher | |
---|---|---|
Empty production |
|
|
Terminal productions |
|
|
Disjunctions, i.e. Alternatives |
|
|
Concatenation |
|
|
Transitive closure |
|
|
This allows path patterns that match paths that are typically considered context free, such as balanced pairs:
PATH PATTERN cousin = ()-/:PARENT> [ ~cousin | ()] <:PARENT/-()
MATCH (me)-/~cousin/-(my_cousin)
RETURN me, collect(my_cousin) AS cousins
// now all we need is you and your cousins, and we have a song by Vampire Weekend