There are various limitations and “gotchas” that one should be aware of when using Titan. Some of these limitations are necessary design choices and others are issues that will be rectified as Titan development continues. Finally, the last section provides solutions to common issues.
Titan can store up to a quintillion edges (2^60) and half as many vertices. That limitation is imposed by Titan’s id scheme.
When declaring the data type of a property key using
dataType(Class) Titan will enforce that all properties for that key have the declared type, unless that type is
Object.class. This is an equality type check, meaning that sub-classes will not be allowed. For instance, one cannot declare the data type to be
Number.class and use
Long. For efficiency reasons, the type needs to match exactly. Hence, use
Object.class as the data type for type flexibility. In all other cases, declare the actual data type to benefit from increased performance and type safety.
Retrieving an edge by id, e.g
tx.getEdge(edge.getId()), is not a constant time operation. Titan will retrieve an adjacent vertex of the edge to be retrieved and then execute a vertex query to identify the edge. The former is constant time but the latter is potentially linear in the number of edges incident on the vertex with the same edge label.
This also applies to index retrievals for edges via a standard or external index.
To index vertices or edges by key, the respective key index must be created before the key is first used in a vertex or edge property. Read more about creating vertex indexes.
Once an index has been created for a key, it can never be removed.
This pitfall constrains the graph schema. While the graph schema can be extended, previous declarations cannot be changed.
Titan provides a batch loading mode that can be enabled through the configuration. However, this batch mode only facilitates faster loading into the storage backend, it does not use storage backend specific batch loading techniques that prepare the data in memory for disk storage. As such, batch loading in Titan is currently slower than batch loading modes provided by single machine databases. The Bulk Loading documentation lists ways to speed up batch loading in Titan.
Another limitation related to batch loading is the failure to load millions of edges into a single vertex at once or in a short time of period. Such supernode loading can fail for some storage backends. This limitation also applies to dense index entries. For more information, please refer to the ticket .
Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for
storage.machine-id-appendix. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.
By default, Titan will automatically create property keys and edge labels when a new type is encountered. It is strongly encouraged that users explicitly define types and disable automatic type creation by setting the graph configuration option
autotype = none.
Titan supports arbitrary objects as attribute values on properties. To use a custom class as data type in Titan, either register a custom serializer or ensure that the class has a no-argument constructor and implements the
equals method because Titan will verify that it can successfully de-/serialize objects of that class. Please read Datatype and Attribute Serializer Configuration for more information.
Edges should not be accessed outside the scope in which they were originally created or retrieved.
When defining unique Titan types with locking enabled (i.e. requesting that Titan ensures uniqueness) it is likely to encounter locking exceptions of the type
PermanentLockingException under concurrent modifications to the graph.
Such exceptions are to be expected, since Titan cannot know how to recover from a transactional state where an earlier read value has been modified by another transaction since this may invalidate the state of the transaction. It most cases it is sufficient to simply re-run the transaction. If locking exceptions are very frequent, try to analyze and remove the source of congestion.
Titan internally represents
Float data types as fixed decimal numbers. Doubles are stored with up to 6 decimal digits and floats with up to 3. This representation enables range retrievals in vertex centric queries. However, it significantly limits the precision and range of doubles and floats.
FullFloat as data type to get the full precision of floating point numbers. However, note that these data types cannot be used in range-constrained vertex centric queries.
When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex. It is possible to guard against ghost vertices on eventually consistent backends using key out-uniqueness but this is prohibitively expensive in most cases. A more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals, for instance using Titan tools.
Another option is to detect them at read-time using the transaction configuration option @ checkInternalVertexExistence()@
Cassandra 1.2.x makes use of Snappy 1.4. Titan will not be able to connect to Cassandra if the server is running Java 1.7 and Cassandra 1.2.x (with Snappy 1.4). Be sure to remove the Snappy 1.4 jar in the
cassandra/lib directory and replace with a Snappy 1.5 jar version (available here).
When the log level is set to
debug Titan produces a lot of logging output which is useful to understand how particular queries get compiled, optimized, and executed. However, the output is so large that it will impact the query performance noticeably. Hence, you
info or above for production systems or benchmarking.
If you experience memory issues or excessive garbage collection while running Titan it is likely that the caches are configured incorrectly. If the caches are too large, the heap may fill up with cache entries. Try reducing the size of the transaction level cache before tuning the database level cache, in particular if you have many concurrent transactions. Read more about Titan's caching layers.
When launching Titan with embedded Cassandra, the following warnings may be displayed:
958 [MutationStage:25] WARN org.apache.cassandra.db.Memtable - MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead
Cassandra uses a Java agent called
MemoryMeter which allows it to measure the actual memory use of an object, including JVM overhead. To use JAMM (Java Agent for Memory Measurements), the path to the JAMM jar must be specific in the Java javaagent parameter when launching the JVM (e.g.
-javaagent:path/to/jamm.jar). Rather than modifying
titan.sh and adding the javaagent parameter, I prefer to set the
JAVA_OPTIONS environment variable with the proper javaagent setting:
By default, Titan uses the Astyanax library to connect to Cassandra clusters. On EC2 and Rackspace, it has been reported that Astyanax was unable to establish a connection to the cluster. In those cases, changing the backend to
storage.backend=cassandrathrift solved the problem.
When numerous clients are connecting to ElasticSearch, it is likely that an
OutOfMemoryException occurs. This is not due to a memory issue, but to the OS not allowing more threads to be spawned by the user (the user running ElasticSearch). To circumvent this issue, increase the number of allowed processes to the user running ElasticSearch. For example, increase the
ulimit -u from the default 1024 to 10024.