Make Titan ID allocation more robust for parallel bulk ingestion #382

mbroecheler · 2013-09-25T05:46:54Z

I'm running an import of both sequencefiles and graphson files using Faunus 0.3.2 into HBase using the TitanHBaseOutputFormat format.

The graphs are already stored in HDFS on the same cluster as my HBase cluster. What I'm running into are errors like the following:

attempt_201309101345_0012_m_000140_0: Exception in thread "Thread-15" com.thinkaurelius.titan.core.TitanException: Could not acquire new ID block from storage
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.renewBuffer(StandardIDPool.java:116)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.access$100(StandardIDPool.java:14)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool$IDBlockThread.run(StandardIDPool.java:171)
attempt_201309101345_0012_m_000140_0: Caused by: com.thinkaurelius.titan.diskstorage.locking.TemporaryLockingException: Exceeded timeout count [4] when attempting to allocate id block
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.diskstorage.idmanagement.ConsistentKeyIDManager.getIDBlock(ConsistentKeyIDManager.java:191)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.renewBuffer(StandardIDPool.java:110)
attempt_201309101345_0012_m_000140_0: ... 2 more

I went to go investigate the StandardIDPool in Titan and I'm trying to figure out why the ID allocation process was done that way and not just automatically assigned something like a UUID instead? It seems like the current implementation is causing an inconsistent deadlock when working with larger graphs (10 million nodes +). My cluster is 24 nodes, with 576 mappers. On the second of the two MapReduce jobs that Faunus compiles down into, the mapper (MapSequence[com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Map, com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Reduce]) runs all 576 mappers up to 100% and then they sit waiting for them to negotiate their ID pools and never complete. A few get through to successfully complete, but most timeout.

I've read the pointers here: https://github.com/thinkaurelius/titan/wiki/Bulk-Loading about tweaking the configuration options, but that has made little to no difference in the outcome. I then went and changed the timeout in my mapred-site to be higher than the default 600 seconds, which helped, but unless the timeouts were ridiculously high, it would just die.

So I guess my questions are as follows:

Is there any reason that the ID allocation process shouldn't/couldn't be UUIDs rather than the way they are currently implemented?
Is there something I'm missing in the configuration that will make this behave more nicely?
Is this something that others are seeing with larger graphs, I assume this would happen regardless of whether it was HBase or Cassandra.

cglewis · 2013-09-25T15:36:27Z

thanks for creating this issue @mbroecheler

feel free to ping me if anyone would like further details about this issue I'm running into.

jschott780 · 2013-09-25T16:04:54Z

We've seen the same behavior with Titan 0.3.1, hand rolled Map/Reduce, and Cassandra. Our graph is ~15 billion vertices and expected to grow to 50 billion very soon. In addition, our graph data tends to churn a couple times a week. Resulting in lots of ETL. So this is somewhat a critical issue for us.

Using UUIDs would solve a couple problems we expect to encounter with IDs. We already have UUIDs for our graph, and its causing some additional book keeping to sync between our external UUID data source and Titan's auto-assigned ID. I also suspect you could get a significant performance boost if when storage.batch-loading = true, allow the end user to specify their own UUIDs (ie the blueprint API). Self assigned UUIDs would be awesome for us.

mbroecheler · 2013-09-25T17:25:12Z

This ticket is about making the ID assignment more robust to high concurrency. Whether Titan should allow UUIDs is a separate issue which I moved to ticket #383.

jschott780 · 2013-09-25T17:32:35Z

I should have been a clearer in my response. Apologies for that.

We are seeing the same timeout issues related to ID allocation. We were running ~1200-2300 mappers to ingest vertices across 86 hosts. It appeared to overwhelm Cassandra as well as Titan. We scaled it back to ~500 mappers with no luck.

mbroecheler · 2013-09-25T17:42:55Z

Exactly, we will fix that.

On Wed, Sep 25, 2013 at 10:32 AM, jschott780 notifications@github.comwrote:

I should have been a clearer in my response. Apologies for that.

We are seeing the same timeout issues related to ID allocation. We were
running ~1200-2300 mappers to ingest vertices across 86 hosts. It appeared
to overwhelm Cassandra as well as Titan. We scaled it back to ~500 mappers
with no luck.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/382#issuecomment-25107843
.

Matthias Broecheler
http://www.matthiasb.com

kdatta · 2013-09-28T05:12:02Z

Matthias, I see similar exceptions while bulk loading vertices to Titan+HBase. Is this open issue resolved?

cglewis · 2013-10-25T05:08:37Z

+1

mbroecheler · 2013-10-25T17:10:51Z

Sorry, my bad, I used the wrong issue number in my last commit. This issue has not yet been closed. We are actively working on this and you should expect a resolution soon. Just not yet ;-)

ghost assigned mbroecheler Sep 25, 2013

mbroecheler mentioned this issue Sep 25, 2013

Allow users to assign UUIDs to vertices #383

Closed

mbroecheler closed this as completed in fc2bc83 Oct 25, 2013

mbroecheler reopened this Oct 25, 2013

mbroecheler closed this as completed in 0c4459a Nov 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Titan ID allocation more robust for parallel bulk ingestion #382

Make Titan ID allocation more robust for parallel bulk ingestion #382

mbroecheler commented Sep 25, 2013

cglewis commented Sep 25, 2013

jschott780 commented Sep 25, 2013

mbroecheler commented Sep 25, 2013

jschott780 commented Sep 25, 2013

mbroecheler commented Sep 25, 2013

kdatta commented Sep 28, 2013

cglewis commented Oct 25, 2013

mbroecheler commented Oct 25, 2013

Make Titan ID allocation more robust for parallel bulk ingestion #382

Make Titan ID allocation more robust for parallel bulk ingestion #382

Comments

mbroecheler commented Sep 25, 2013

cglewis commented Sep 25, 2013

jschott780 commented Sep 25, 2013

mbroecheler commented Sep 25, 2013

jschott780 commented Sep 25, 2013

mbroecheler commented Sep 25, 2013

kdatta commented Sep 28, 2013

cglewis commented Oct 25, 2013

mbroecheler commented Oct 25, 2013