Flow Rank Pattern

jbmusso edited this page Jul 11, 2016 · 24 revisions

Attention: this Wiki hosts an outdated version of the TinkerPop framework and Gremlin language documentation.

Please visit the Apache TinkerPop website and latest documentation.


Many times its important to determine how many times a particular element is traversed over. That is, determine the flow through an element. Particular examples include:

  • “Rank my friends friends by how many friends we share in common.”
  • “Rank items by how many people, who like the same things I like, like them.”
gremlin> software = []
gremlin> g.V('lang','java').fill(software)
==>v[3]
==>v[5]
gremlin> software
==>v[3]
==>v[5]

Vertices v[3] and v[5] are software projects. Lets determine who is on the most software projects. In other words, lets traverse out of these vertices and see which developer vertices get the most flow.

gremlin> software._().in('created').name.groupCount.cap
==>{marko=1, peter=1, josh=2}

Josh received the most traversals through him. The groupCount step maintains an internal Map<Object,Number>. The cap step is used to “cap” groupCount and have it emit its internal map, not the elements that flow through it. The following example better explains “capping” by demonstrating what happens when its not used.

gremlin> m = [:]
gremlin> software._().in('created').name.groupCount(m)
==>marko
==>josh
==>peter
==>josh
gremlin> m
==>marko=1
==>josh=2
==>peter=1

Here is a more complicated example using the loop step and the Grateful Dead graph diagrammed in Defining a More Complex Property Graph.

g = new TinkerGraph()
g.loadGraphML('data/graph-example-2.xml')

The example below will continue to loop until a counter reaches 1000 (so the loop doesn’t continue indefinitely). The loop will walk the outgoing edges of a vertex and update the flow map m. What is returned is how many times each song is traversed when starting from vertex 12 (Me and My Uncle). Finally, iterate() is appended so no results are outputted to the terminal.

gremlin> c = 0
==>0
gremlin> m = [:]
gremlin> g.v(12).as('x').out.groupCount(m){it.name}.loop('x'){c++ < 1000}.iterate()
gremlin> m
==>CHINA CAT SUNFLOWER=518
==>RAMBLE ON ROSE=527
==>WHARF RAT=213
==>HES GONE=439
==>HURTS ME TOO=112
==>SHIP OF FOOLS=345
==>FRIEND OF THE DEVIL=378
==>IT MUST HAVE BEEN THE ROSES=371
==>JACK STRAW=540
==>MEXICALI BLUES=369
==>CANDYMAN=345
==>LOOKS LIKE RAIN=471
==>DIRE WOLF=341
==>HELP ON THE WAY=245
...
gremlin> println g.v(12).as('x').out.groupCount(m){it.name}.loop('x'){c++ < 1000}
[StartPipe, LoopPipe([OutPipe, GroupCountFunctionPipe])]
==>null

Finally, you can sort your rankings and, for example, get the top 5 results.

gremlin> m.sort{a,b -> b.value <=> a.value}[0..4]
==>PLAYING IN THE BAND=587
==>ME AND MY UNCLE=570
==>JACK STRAW=540
==>EL PASO=532
==>RAMBLE ON ROSE=527