Navigation Menu

Skip to content

Commit

Permalink
Went through entire documentation updating code examples and referenc…
Browse files Browse the repository at this point in the history
…es to tinkerpop.

Added gremlin.graph config option
Added TimeUnit to auto imports for gremlin shell
Updated gremlin server config
Fixed elastic search antlr version collision with Cassandra
#872
  • Loading branch information
BrynCooke committed Apr 29, 2015
1 parent 435f046 commit f9e3200
Show file tree
Hide file tree
Showing 50 changed files with 709 additions and 879 deletions.
5 changes: 1 addition & 4 deletions NOTICE.txt
Expand Up @@ -13,10 +13,7 @@ This product includes software developed by Aurelius (http://thinkaurelius.com/)

It also includes software from other open source projects including, but not limited to (check pom.xml for complete listing):

* TinkerPop Blueprints [http://blueprints.tinkerpop.com]
* TinkerPop Gremlin [http://gremlin.tinkerpop.com]
* TinkerPop Rexster [http://rexster.tinkerpop.com]
* TinkerPop Frames [http://frames.tinkerpop.com]
* TinkerPop [http://tinkerpop.incubator.apache.org/]
* Apache Commons [http://commons.apache.org/]
* Google Guava [http://code.google.com/p/guava-libraries/]
* HPPC [http://labs.carrotsearch.com/hppc.html]
Expand Down
2 changes: 1 addition & 1 deletion docs/TitanBus.md
Expand Up @@ -7,7 +7,7 @@ The purpose of the trigger log is to capture the mutations of a transaction so t

The trigger log consists of multiple sub-logs as configured by the user. When opening a transaction, the identifier for the trigger sub-log can be specified:

tx = graph.buildTransaction().setLogIdentifier("purchase").start();
tx = g.buildTransaction().logIdentifier("purchase").start();

In this case, the identifier is "purchase" which means that the mutations of this transaction will be written to a log with the name "trigger_purchase". This gives the user control over where transactional mutations are logged. If no trigger log is specified, no trigger log entry will be created.

Expand Down
40 changes: 17 additions & 23 deletions docs/advblueprints.txt
@@ -1,36 +1,30 @@
[[advanced-blueprints]]
Advanced Blueprints
Advanced Tinkerpop
-------------------

//image:https://raw.github.com/tinkerpop/blueprints/master/doc/images/blueprints-character-3.png[]
http://tinkerpop.incubator.apache.org/[Tinkerpop] provides a set of common http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#traversalstrategy[traversal strategies] that add additional functionality to graphs.

http://blueprints.tinkerpop.com/[Blueprints] provides a set of common property graph interfaces by which any vendor can implement and leverage the http://tinkerpop.com[TinkerPop] stack of technologies. Within Blueprints, there are other utilities that are generally useful like import/export formats as well graph wrappers.

Using IdGraph
Using ElementIdStrategy
~~~~~~~~~~~~~

It is possible to use Blueprints' https://github.com/tinkerpop/blueprints/wiki/Id-Implementation[IdGraph] with Titan. IdGraph requires a property named `__id` that maps arbitrary user-provided identifiers to Titan's internally-assigned long identifiers. This property name is also available programmatically as the public static string `IdGraph.ID`.
It is possible to use http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#_elementidstrategy[ElementIdStrategy] with Titan. ElementIdStrategy allow an arbitrary property to be used as the element ID instead of Titans's long identifiers.

[IMPORTANT]
The `__id` property key must be created and covered by a unique index in Titan prior to using `IdGraph` with Titan.
The target property key must be created and covered by a unique index in Titan prior to using `ElementIdStrategy` with Titan, otherwise Vertex lookups will result in sequential scans of the graph.

To prepare Titan for IdGraph, first create the `__id` property key. Set the `dataType` of the property key to match the custom IDs that you intend to use. Second, build a unique composite index on the `__id` property key. The following example shows how to define and index the `__id` property key to support IdGraph with string vertex IDs.
To prepare Titan for ElementIdStrategy, first create the property key. Set the `dataType` of the property key to match the custom IDs that you intend to use. Second, build a unique composite index on the property key. The following example shows how to define and index the property key to support IdGraph with string vertex IDs.

[source,gremlin]
[source, gremlin]
g = TitanFactory.open("berkeleyje:/tmp/test")
// Define a property key and index for IdGraph-managed vertex IDs
mgmt = g.getManagementSystem();
id = mgmt.makePropertyKey(IdGraph.ID).dataType(String.class).make()
mgmt.buildIndex("byvid",Vertex.class).addKey(id).unique().buildCompositeIndex()
// Define a property key and index for managed vertex IDs
mgmt = g.openManagement()
idKey = mgmt.makePropertyKey("name").dataType(String.class).make()
mgmt.buildIndex("byName", Vertex.class).addKey(idKey).unique().buildCompositeIndex()
mgmt.commit()
// Create an IdGraph that manages vertex IDs but not edge IDs
ig = new IdGraph(g, true, false)
// Insert example vertex with custom identifier
hercules = ig.addVertex("hercules")
g.v("hercules")
zeus = ig.addVertex("zeus")
// Create an that manages vertex IDs but not edge IDs
strategy = ElementIdStrategy.build().idPropertyKey("name").create()
ig = GraphTraversalSource.build().with(strategy).create(g)

// If only user defined ids on vertices (or edges) is needed, then use one of the overloaded `IdGraph` constructors. It is still helpful, although not strictly necessary, to define an index:
//
//[source,gremlin]
//ig = new IdGraph(g, true, false) // true for vertices, false for edges
// Insert example vertex with custom identifier
hercules = ig.addV(T.id, "hercules")
ig.V("hercules")
47 changes: 28 additions & 19 deletions docs/advschema.txt
Expand Up @@ -10,7 +10,7 @@ Static Vertices

Vertex labels can be defined as *static* which means that vertices with that label cannot be modified outside the transaction in which they were created.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
tweet = mgmt.makeVertexLabel('tweet').setStatic().make()
mgmt.commit()
Expand All @@ -30,38 +30,38 @@ The following storage backends support vertex and edge label TTL.
Edge TTL
^^^^^^^^

Edge TTL is defined on a per-edge label basis, meaning that all edges of that label have the same time-to-live.
Edge TTL is defined on a per-edge label basis, meaning that all edges of that label have the same time-to-live. Note that the backend must support cell level TTL. Currently only Cassandra supports this.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
visits = mgmt.makeEdgeLabel('visits').make()
mgmt.setTTL(visits,7,TimeUnit.DAYS)
mgmt.setTTL(visits, 7, DAYS)
mgmt.commit()

Note, that modifying an edge resets the TTL for that edge. Also note, that the TTL of an edge label can be modified but it might take some time for this change to propagate to all running Titan instances which means that two different TTLs can be temporarily in use for the same label.

Property TTL
^^^^^^^^^^^^

Property TTL is very similar to edge TTL and defined on a per-property key basis, meaning that all properties of that key have the same time-to-live.
Property TTL is very similar to edge TTL and defined on a per-property key basis, meaning that all properties of that key have the same time-to-live. Note that the backend must support cell level TTL. Currently only Cassandra supports this.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
sensor = mgmt.makePropertyKey('sensor').cardinality(Cardinality.LIST).dataType(Double.class).make()
mgmt.setTTL(sensor,21,TimeUnit.DAYS)
mgmt.setTTL(sensor, 21, DAYS)
mgmt.commit()

As with edge TTL, modifying an existing property resets the TTL for that property and modifying the TTL for a property key might not immediately take effect.

Vertex TTL
^^^^^^^^^^

Vertex TTL is defined on a per-vertex label basis, meaning that all vertices of that label have the same time-to-live. The configured TTL applies to the vertex, its properties, and all incident edges to ensure that the entire vertex is removed from the graph. For this reason, a vertex label must be defined as _static_ before a TTL can be set to rule out any modifications that would invalidate the vertex TTL. Vertex TTL only applies to static vertex labels.
Vertex TTL is defined on a per-vertex label basis, meaning that all vertices of that label have the same time-to-live. The configured TTL applies to the vertex, its properties, and all incident edges to ensure that the entire vertex is removed from the graph. For this reason, a vertex label must be defined as _static_ before a TTL can be set to rule out any modifications that would invalidate the vertex TTL. Vertex TTL only applies to static vertex labels. Note that the backend must support store level TTL. Currently only Cassandra and HBase support this.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
tweet = mgmt.makeVertexLabel('tweet').setStatic().make()
mgmt.setTTL(tweet,36,TimeUnit.HOURS)
mgmt.setTTL(tweet, 36, HOURS)
mgmt.commit()

Note, that the TTL of a vertex label can be modified but it might take some time for this change to propagate to all running Titan instances which means that two different TTLs can be temporarily in use for the same label.
Expand All @@ -71,16 +71,16 @@ Multi-Properties

As dicussed in <<schema>>, Titan supports property keys with SET and LIST cardinality. Hence, Titan supports multiple properties with the same key on a single vertex. Furthermore, Titan treats properties similarly to edges in that single-valued property annotations are allowed on properties as shown in the following example.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.LIST).make()
mgmt.commit()
v = g.addVertex()
p1 = v.property('name','Dan LaRocque')
p1.property('source','web')
p2 = v.property('name','dalaro')
p2.property('source','github')
g.commit()
p1 = v.property('name', 'Dan LaRocque')
p1.property('source', 'web')
p2 = v.property('name', 'dalaro')
p2.property('source', 'github')
g.tx().commit()
v.properties('name')
==> Iterable over all name properties

Expand All @@ -93,20 +93,29 @@ Unidirected Edges

Unidirected edges are edges that can only be traversed in the out-going direction. Unidirected edges have a lower storage footprint but are limited in the types of traversals they support. Unidirected edges are conceptually similar to hyperlinks in the world-wide-web in the sense that the out-vertex can traverse through the edge, but the in-vertex is unaware of its existence.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
link = mgmt.makeEdgeLabel('link').unidirected().make()
mgmt.commit()

Unidirected edges can be added on edges and properties, thereby giving Titan limited support for hyper-edges. For example, this can be useful for capturing authorship provenance information for edges as shown in the following example, where we add a unidirected `author` edge on the `knows` edge to store the fact that `user` added this edge to the graph.

[source,gremlin]
[source, gremlin]
mgmt = g.openManagement()
mgmt.makeEdgeLabel('author').unidirected().make()
mgmt.commit()
user = g.v(4)
u = g.v(8)
v = g.v(16)
v.addEdge('knows',u).property('author',user)
v.addEdge('knows', u).property('author', user)


mgmt = g.openManagement()
mgmt.makeEdgeLabel('author').unidirected().make()
mgmt.commit()
user = g.addVertex(T.label, 'author');
book = g.addVertex();
author = g.addVertex();
user.addE('knows', book).property('author', author)

Note, that unidirected edges do not get automatically deleted when their in-vertices are deleted. The user must ensure that such inconsistencies do not arise or resolve them at query time by explicitly checking vertex existence in a transaction. See the discussion in <<ghost-vertices>> for more information.
12 changes: 6 additions & 6 deletions docs/bdb.txt
Expand Up @@ -5,7 +5,7 @@ BerkeleyDB
//[.tss-center.tss-width-250]
//image:http://download.oracle.com/berkeley-db/docs/je/3.2.76/images/Oracle_BerkeleyDB_clr.bmp[link="http://www.oracle.com/technetwork/products/berkeleydb"]

[quote,'http://www.oracle.com/technetwork/products/berkeleydb[BerkeleyDB Homepage]']
[quote, 'http://www.oracle.com/technetwork/products/berkeleydb[BerkeleyDB Homepage]']
Berkeley DB enables the development of custom data management
solutions, without the overhead traditionally associated with such
custom projects. Berkeley DB provides a collection of well-proven
Expand All @@ -22,11 +22,11 @@ BerkeleyDB Setup

Since BerkeleyDB runs in the same JVM as Titan, connecting the two only requires a simple configuration and no additional setup:

[source,java]
TitanGraph g = TitanFactory.build()
.set("storage.backend", "berkeleyje")
.set("storage.directory", "/tmp/graph")
.open();
[source, java]
TitanGraph g = TitanFactory.build().
set("storage.backend", "berkeleyje").
set("storage.directory", "/tmp/graph").
open();

In the Gremlin shell, you can not define the type of the variables `conf` and `g`. Therefore, simply leave off the type declaration.

Expand Down
4 changes: 2 additions & 2 deletions docs/building.txt
Expand Up @@ -15,7 +15,7 @@ Depending on Titan Snapshots

For developing against the most current version of Titan, depend on Titan snapshot releases. Note, that these releases are development releases and therefore unstable and likely to change. Unless one is interested in the most recent development status of Titan, we recommend to use the stable Titan release instead.

[source,xml]
[source, xml]
<dependency>
<groupId>com.thinkaurelius.titan</groupId>
<artifactId>titan-core</artifactId>
Expand All @@ -27,7 +27,7 @@ SNAPSHOTs are available through the https://oss.sonatype.org/content/repositorie

When adding this dependency, be sure to add the following repository to the `pom.xml`:

[source,xml]
[source, xml]
<repository>
<id>sonatype-nexus-snapshots</id>
<name>Sonatype Nexus Snapshots</name>
Expand Down
8 changes: 4 additions & 4 deletions docs/bulkloading.txt
Expand Up @@ -10,7 +10,7 @@ There are a number of configuration options and tools that make ingesting large
There are a number of use cases for bulk loading data into Titan, including:

* Introducing Titan into an existing environment with existing data and migrating or duplicating this data into a new Titan cluster.
* Using Titan as an end point of an http://en.wikipedia.org/wiki/Extract,_transform,_load[ETL] process.
* Using Titan as an end point of an http://en.wikipedia.org/wiki/Extract, _transform, _load[ETL] process.
* Adding an existing or external graph datasets (e.g. publicly available http://linkeddata.org/[RDF datasets]) to a running Titan cluster.
* Updating a Titan graph with results from a graph analytics job.

Expand Down Expand Up @@ -82,12 +82,12 @@ During bulk loading, the load on the cluster typically increases making it more
//Titan-Hadoop
//^^^^^^^^^^^^

//For very large graphs the best option to load data efficiently is <<hadoop,Titan-Hadoop>> using one of the supported input format and specifying Titan as the output format.
//For very large graphs the best option to load data efficiently is <<hadoop, Titan-Hadoop>> using one of the supported input format and specifying Titan as the output format.

//BatchGraph
//^^^^^^^^^^

//For medium size graph datasets (up to 100s million edges), Blueprints' https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation[BatchGraph] is a useful tool for bulk loading data into Titan from a single machine through Titan's native Blueprints interface. BatchGraph effectively caches externally provided vertex ids to eliminate reads against Titan. This allows bulk loading with minimal read load.
//For medium size graph datasets (up to 100s million edges), Tinkerpop ' http://tinkerpop.incubator.apache.org/docs/3.0.0.M8-incubating/#_batchgraph[BatchGraph] is a useful tool for bulk loading data into Titan from a single machine through Titan's native Blueprints interface. BatchGraph effectively caches externally provided vertex ids to eliminate reads against Titan. This allows bulk loading with minimal read load.

//BatchGraph is limited to single machine bulk loading use cases and requires enough local RAM to hold the entire vertex id cache in memory. BatchGraph supports id compression to reduce the memory requirements. Please refer to the https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation[BatchGraph documentation] for more information on how to use BatchGraph most effectively.

Expand All @@ -112,7 +112,7 @@ If Hadoop cannot be used for parallelizing the bulk loading process, here are so
Data Sorting
^^^^^^^^^^^^

Presorting the data to be bulk loaded can significantly increase the loading performance through BatchGraph. The https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation[BatchGraph] documentation describes this strategy in more detail. It has been reported that loading times were decreased by a factor of 2 or more when presorting the bulk loaded data.
Presorting the data to be bulk loaded can significantly increase the loading performance through BatchGraph. The http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#_batchgraph[BatchGraph] documentation describes this strategy in more detail. It has been reported that loading times were decreased by a factor of 2 or more when presorting the bulk loaded data.

Q&A
~~~
Expand Down

0 comments on commit f9e3200

Please sign in to comment.