Went through entire documentation updating code examples and referenc…

…es to tinkerpop. Added gremlin.graph config option Added TimeUnit to auto imports for gremlin shell Updated gremlin server config Fixed elastic search antlr version collision with Cassandra #872
thinkaurelius · Apr 29, 2015 · f9e3200 · f9e3200
1 parent 435f046
commit f9e3200
Show file tree

Hide file tree

Showing 50 changed files with 709 additions and 879 deletions.
diff --git a/NOTICE.txt b/NOTICE.txt
@@ -13,10 +13,7 @@ This product includes software developed by Aurelius (http://thinkaurelius.com/)
 
 It also includes software from other open source projects including, but not limited to (check pom.xml for complete listing):
 
- * TinkerPop Blueprints [http://blueprints.tinkerpop.com]
- * TinkerPop Gremlin [http://gremlin.tinkerpop.com]
- * TinkerPop Rexster [http://rexster.tinkerpop.com]
- * TinkerPop Frames [http://frames.tinkerpop.com]
+ * TinkerPop [http://tinkerpop.incubator.apache.org/]
  * Apache Commons [http://commons.apache.org/]
  * Google Guava [http://code.google.com/p/guava-libraries/]
  * HPPC [http://labs.carrotsearch.com/hppc.html]

diff --git a/docs/TitanBus.md b/docs/TitanBus.md
@@ -7,7 +7,7 @@ The purpose of the trigger log is to capture the mutations of a transaction so t
 
 The trigger log consists of multiple sub-logs as configured by the user. When opening a transaction, the identifier for the trigger sub-log can be specified:
 
-    tx = graph.buildTransaction().setLogIdentifier("purchase").start();
+    tx = g.buildTransaction().logIdentifier("purchase").start();
 
 In this case, the identifier is "purchase" which means that the mutations of this transaction will be written to a log with the name "trigger_purchase". This gives the user control over where transactional mutations are logged. If no trigger log is specified, no trigger log entry will be created.
 

diff --git a/docs/advblueprints.txt b/docs/advblueprints.txt
@@ -1,36 +1,30 @@
 [[advanced-blueprints]]
-Advanced Blueprints
+Advanced Tinkerpop
 -------------------
 
-//image:https://raw.github.com/tinkerpop/blueprints/master/doc/images/blueprints-character-3.png[]
+http://tinkerpop.incubator.apache.org/[Tinkerpop] provides a set of common http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#traversalstrategy[traversal strategies] that add additional functionality to graphs.
 
-http://blueprints.tinkerpop.com/[Blueprints] provides a set of common property graph interfaces by which any vendor can implement and leverage the http://tinkerpop.com[TinkerPop] stack of technologies. Within Blueprints, there are other utilities that are generally useful like import/export formats as well graph wrappers.
-
-Using IdGraph
+Using ElementIdStrategy
 ~~~~~~~~~~~~~
 
-It is possible to use Blueprints' https://github.com/tinkerpop/blueprints/wiki/Id-Implementation[IdGraph] with Titan. IdGraph requires a property named `__id` that maps arbitrary user-provided identifiers to Titan's internally-assigned long identifiers.  This property name is also available programmatically as the public static string `IdGraph.ID`.
+It is possible to use http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#_elementidstrategy[ElementIdStrategy] with Titan. ElementIdStrategy allow an arbitrary property to be used as the element ID instead of Titans's long identifiers.
 
 [IMPORTANT]
-The `__id` property key must be created and covered by a unique index in Titan prior to using `IdGraph` with Titan.
+The target property key must be created and covered by a unique index in Titan prior to using `ElementIdStrategy` with Titan, otherwise Vertex lookups will result in sequential scans of the graph.
 
-To prepare Titan for IdGraph, first create the `__id` property key.  Set the `dataType` of the property key to match the custom IDs that you intend to use.  Second, build a unique composite index on the `__id` property key.  The following example shows how to define and index the `__id` property key to support IdGraph with string vertex IDs.
+To prepare Titan for ElementIdStrategy, first create the property key. Set the `dataType` of the property key to match the custom IDs that you intend to use. Second, build a unique composite index on the property key. The following example shows how to define and index the property key to support IdGraph with string vertex IDs.
 
-[source,gremlin]
+[source, gremlin]
 g = TitanFactory.open("berkeleyje:/tmp/test")
-// Define a property key and index for IdGraph-managed vertex IDs
-mgmt = g.getManagementSystem();
-id = mgmt.makePropertyKey(IdGraph.ID).dataType(String.class).make()
-mgmt.buildIndex("byvid",Vertex.class).addKey(id).unique().buildCompositeIndex()
+// Define a property key and index for managed vertex IDs
+mgmt = g.openManagement()
+idKey = mgmt.makePropertyKey("name").dataType(String.class).make()
+mgmt.buildIndex("byName", Vertex.class).addKey(idKey).unique().buildCompositeIndex()
 mgmt.commit()
-// Create an IdGraph that manages vertex IDs but not edge IDs
-ig = new IdGraph(g, true, false)
-// Insert example vertex with custom identifier
-hercules = ig.addVertex("hercules")
-g.v("hercules")
-zeus = ig.addVertex("zeus")
+// Create an that manages vertex IDs but not edge IDs
+strategy = ElementIdStrategy.build().idPropertyKey("name").create()
+ig = GraphTraversalSource.build().with(strategy).create(g)
 
-// If only user defined ids on vertices (or edges) is needed, then use one of the overloaded `IdGraph` constructors.  It is still helpful, although not strictly necessary, to define an index:
-//
-//[source,gremlin]
-//ig = new IdGraph(g, true, false)  // true for vertices, false for edges
+// Insert example vertex with custom identifier
+hercules = ig.addV(T.id, "hercules")
+ig.V("hercules")
diff --git a/docs/advschema.txt b/docs/advschema.txt
@@ -10,7 +10,7 @@ Static Vertices
 
 Vertex labels can be defined as *static* which means that vertices with that label cannot be modified outside the transaction in which they were created. 
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 tweet = mgmt.makeVertexLabel('tweet').setStatic().make()
 mgmt.commit()
@@ -30,38 +30,38 @@ The following storage backends support vertex and edge label TTL.
 Edge TTL
 ^^^^^^^^
 
-Edge TTL is defined on a per-edge label basis, meaning that all edges of that label have the same time-to-live.
+Edge TTL is defined on a per-edge label basis, meaning that all edges of that label have the same time-to-live. Note that the backend must support cell level TTL. Currently only Cassandra supports this.
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 visits = mgmt.makeEdgeLabel('visits').make()
-mgmt.setTTL(visits,7,TimeUnit.DAYS)
+mgmt.setTTL(visits, 7, DAYS)
 mgmt.commit()
 
 Note, that modifying an edge resets the TTL for that edge. Also note, that the TTL of an edge label can be modified but it might take some time for this change to propagate to all running Titan instances which means that two different TTLs can be temporarily in use for the same label.
 
 Property TTL
 ^^^^^^^^^^^^
 
-Property TTL is very similar to edge TTL and defined on a per-property key basis, meaning that all properties of that key have the same time-to-live.
+Property TTL is very similar to edge TTL and defined on a per-property key basis, meaning that all properties of that key have the same time-to-live. Note that the backend must support cell level TTL. Currently only Cassandra supports this.
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 sensor = mgmt.makePropertyKey('sensor').cardinality(Cardinality.LIST).dataType(Double.class).make()
-mgmt.setTTL(sensor,21,TimeUnit.DAYS)
+mgmt.setTTL(sensor, 21, DAYS)
 mgmt.commit()
 
 As with edge TTL, modifying an existing property resets the TTL for that property and modifying the TTL for a property key might not immediately take effect.
 
 Vertex TTL
 ^^^^^^^^^^
 
-Vertex TTL is defined on a per-vertex label basis, meaning that all vertices of that label have the same time-to-live. The configured TTL applies to the vertex, its properties, and all incident edges to ensure that the entire vertex is removed from the graph. For this reason, a vertex label must be defined as _static_ before a TTL can be set to rule out any modifications that would invalidate the vertex TTL. Vertex TTL only applies to static vertex labels.
+Vertex TTL is defined on a per-vertex label basis, meaning that all vertices of that label have the same time-to-live. The configured TTL applies to the vertex, its properties, and all incident edges to ensure that the entire vertex is removed from the graph. For this reason, a vertex label must be defined as _static_ before a TTL can be set to rule out any modifications that would invalidate the vertex TTL. Vertex TTL only applies to static vertex labels. Note that the backend must support store level TTL. Currently only Cassandra and HBase support this.
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 tweet = mgmt.makeVertexLabel('tweet').setStatic().make()
-mgmt.setTTL(tweet,36,TimeUnit.HOURS)
+mgmt.setTTL(tweet, 36, HOURS)
 mgmt.commit()
 
 Note, that the TTL of a vertex label can be modified but it might take some time for this change to propagate to all running Titan instances which means that two different TTLs can be temporarily in use for the same label.
@@ -71,16 +71,16 @@ Multi-Properties
 
 As dicussed in <<schema>>, Titan supports property keys with SET and LIST cardinality. Hence, Titan supports multiple properties with the same key on a single vertex. Furthermore, Titan treats properties similarly to edges in that single-valued property annotations are allowed on properties as shown in the following example.
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.LIST).make()
 mgmt.commit()
 v = g.addVertex()
-p1 = v.property('name','Dan LaRocque')
-p1.property('source','web')
-p2 = v.property('name','dalaro')
-p2.property('source','github')
-g.commit()
+p1 = v.property('name', 'Dan LaRocque')
+p1.property('source', 'web')
+p2 = v.property('name', 'dalaro')
+p2.property('source', 'github')
+g.tx().commit()
 v.properties('name')
 ==> Iterable over all name properties
 
@@ -93,20 +93,29 @@ Unidirected Edges
 
 Unidirected edges are edges that can only be traversed in the out-going direction. Unidirected edges have a lower storage footprint but are limited in the types of traversals they support. Unidirected edges are conceptually similar to hyperlinks in the world-wide-web in the sense that the out-vertex can traverse through the edge, but the in-vertex is unaware of its existence.
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 link = mgmt.makeEdgeLabel('link').unidirected().make()
 mgmt.commit()
 
 Unidirected edges can be added on edges and properties, thereby giving Titan limited support for hyper-edges. For example, this can be useful for capturing authorship provenance information for edges as shown in the following example, where we add a unidirected `author` edge on the `knows` edge to store the fact that `user` added this edge to the graph.
 
-[source,gremlin]
+[source, gremlin]
 mgmt = g.openManagement()
 mgmt.makeEdgeLabel('author').unidirected().make()
 mgmt.commit()
 user = g.v(4)
 u = g.v(8)
 v = g.v(16)
-v.addEdge('knows',u).property('author',user)
+v.addEdge('knows', u).property('author', user)
+
+
+mgmt = g.openManagement()
+mgmt.makeEdgeLabel('author').unidirected().make()
+mgmt.commit()
+user = g.addVertex(T.label, 'author');
+book = g.addVertex();
+author = g.addVertex();
+user.addE('knows', book).property('author', author)
 
 Note, that unidirected edges do not get automatically deleted when their in-vertices are deleted. The user must ensure that such inconsistencies do not arise or resolve them at query time by explicitly checking vertex existence in a transaction. See the discussion in <<ghost-vertices>> for more information.
diff --git a/docs/bdb.txt b/docs/bdb.txt
@@ -5,7 +5,7 @@ BerkeleyDB
 //[.tss-center.tss-width-250]
 //image:http://download.oracle.com/berkeley-db/docs/je/3.2.76/images/Oracle_BerkeleyDB_clr.bmp[link="http://www.oracle.com/technetwork/products/berkeleydb"]
 
-[quote,'http://www.oracle.com/technetwork/products/berkeleydb[BerkeleyDB Homepage]']
+[quote, 'http://www.oracle.com/technetwork/products/berkeleydb[BerkeleyDB Homepage]']
 Berkeley DB enables the development of custom data management
 solutions, without the overhead traditionally associated with such
 custom projects. Berkeley DB provides a collection of well-proven
@@ -22,11 +22,11 @@ BerkeleyDB Setup
 
 Since BerkeleyDB runs in the same JVM as Titan, connecting the two only requires a simple configuration and no additional setup:
 
-[source,java]
-TitanGraph g = TitanFactory.build()
-	.set("storage.backend", "berkeleyje")
-	.set("storage.directory", "/tmp/graph")
-	.open();
+[source, java]
+TitanGraph g = TitanFactory.build().
+set("storage.backend", "berkeleyje").
+set("storage.directory", "/tmp/graph").
+open();
 
 In the Gremlin shell, you can not define the type of the variables `conf` and `g`. Therefore, simply leave off the type declaration. 
 

diff --git a/docs/building.txt b/docs/building.txt
@@ -15,7 +15,7 @@ Depending on Titan Snapshots
 
 For developing against the most current version of Titan, depend on Titan snapshot releases. Note, that these releases are development releases and therefore unstable and likely to change. Unless one is interested in the most recent development status of Titan, we recommend to use the stable Titan release instead.
 
-[source,xml]
+[source, xml]
 <dependency>
    <groupId>com.thinkaurelius.titan</groupId>
    <artifactId>titan-core</artifactId>
@@ -27,7 +27,7 @@ SNAPSHOTs are available through the https://oss.sonatype.org/content/repositorie
 
 When adding this dependency, be sure to add the following repository to the `pom.xml`:
 
-[source,xml]
+[source, xml]
 <repository>
   <id>sonatype-nexus-snapshots</id>
   <name>Sonatype Nexus Snapshots</name>

diff --git a/docs/bulkloading.txt b/docs/bulkloading.txt
@@ -10,7 +10,7 @@ There are a number of configuration options and tools that make ingesting large
 There are a number of use cases for bulk loading data into Titan, including:
 
 * Introducing Titan into an existing environment with existing data and migrating or duplicating this data into a new Titan cluster.
-* Using Titan as an end point of an http://en.wikipedia.org/wiki/Extract,_transform,_load[ETL] process.
+* Using Titan as an end point of an http://en.wikipedia.org/wiki/Extract, _transform, _load[ETL] process.
 * Adding an existing or external graph datasets (e.g. publicly available http://linkeddata.org/[RDF datasets]) to a running Titan cluster.
 * Updating a Titan graph with results from a graph analytics job.
 
@@ -82,12 +82,12 @@ During bulk loading, the load on the cluster typically increases making it more
 //Titan-Hadoop
 //^^^^^^^^^^^^
 
-//For very large graphs the best option to load data efficiently is <<hadoop,Titan-Hadoop>> using one of the supported input format and specifying Titan as the output format.
+//For very large graphs the best option to load data efficiently is <<hadoop, Titan-Hadoop>> using one of the supported input format and specifying Titan as the output format.
 
 //BatchGraph
 //^^^^^^^^^^
 
-//For medium size graph datasets (up to 100s million edges), Blueprints' https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation[BatchGraph] is a useful tool for bulk loading data into Titan from a single machine through Titan's native Blueprints interface. BatchGraph effectively caches externally provided vertex ids to eliminate reads against Titan. This allows bulk loading with minimal read load.
+//For medium size graph datasets (up to 100s million edges), Tinkerpop ' http://tinkerpop.incubator.apache.org/docs/3.0.0.M8-incubating/#_batchgraph[BatchGraph] is a useful tool for bulk loading data into Titan from a single machine through Titan's native Blueprints interface. BatchGraph effectively caches externally provided vertex ids to eliminate reads against Titan. This allows bulk loading with minimal read load.
 
 //BatchGraph is limited to single machine bulk loading use cases and requires enough local RAM to hold the entire vertex id cache in memory. BatchGraph supports id compression to reduce the memory requirements. Please refer to the https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation[BatchGraph documentation] for more information on how to use BatchGraph most effectively.
 
@@ -112,7 +112,7 @@ If Hadoop cannot be used for parallelizing the bulk loading process, here are so
 Data Sorting
 ^^^^^^^^^^^^
 
-Presorting the data to be bulk loaded can significantly increase the loading performance through BatchGraph.  The https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation[BatchGraph] documentation describes this strategy in more detail. It has been reported that loading times were decreased by a factor of 2 or more when presorting the bulk loaded data.
+Presorting the data to be bulk loaded can significantly increase the loading performance through BatchGraph.  The http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#_batchgraph[BatchGraph] documentation describes this strategy in more detail. It has been reported that loading times were decreased by a factor of 2 or more when presorting the bulk loaded data.
 
 Q&A
 ~~~