New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Titan ID allocation more robust for parallel bulk ingestion #382
Comments
thanks for creating this issue @mbroecheler feel free to ping me if anyone would like further details about this issue I'm running into. |
We've seen the same behavior with Titan 0.3.1, hand rolled Map/Reduce, and Cassandra. Our graph is ~15 billion vertices and expected to grow to 50 billion very soon. In addition, our graph data tends to churn a couple times a week. Resulting in lots of ETL. So this is somewhat a critical issue for us. Using UUIDs would solve a couple problems we expect to encounter with IDs. We already have UUIDs for our graph, and its causing some additional book keeping to sync between our external UUID data source and Titan's auto-assigned ID. I also suspect you could get a significant performance boost if when storage.batch-loading = true, allow the end user to specify their own UUIDs (ie the blueprint API). Self assigned UUIDs would be awesome for us. |
This ticket is about making the ID assignment more robust to high concurrency. Whether Titan should allow UUIDs is a separate issue which I moved to ticket #383. |
I should have been a clearer in my response. Apologies for that. We are seeing the same timeout issues related to ID allocation. We were running ~1200-2300 mappers to ingest vertices across 86 hosts. It appeared to overwhelm Cassandra as well as Titan. We scaled it back to ~500 mappers with no luck. |
Exactly, we will fix that. On Wed, Sep 25, 2013 at 10:32 AM, jschott780 notifications@github.comwrote:
Matthias Broecheler |
Matthias, I see similar exceptions while bulk loading vertices to Titan+HBase. Is this open issue resolved? |
+1 |
Sorry, my bad, I used the wrong issue number in my last commit. This issue has not yet been closed. We are actively working on this and you should expect a resolution soon. Just not yet ;-) |
I'm running an import of both sequencefiles and graphson files using Faunus 0.3.2 into HBase using the TitanHBaseOutputFormat format.
The graphs are already stored in HDFS on the same cluster as my HBase cluster. What I'm running into are errors like the following:
attempt_201309101345_0012_m_000140_0: Exception in thread "Thread-15" com.thinkaurelius.titan.core.TitanException: Could not acquire new ID block from storage
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.renewBuffer(StandardIDPool.java:116)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.access$100(StandardIDPool.java:14)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool$IDBlockThread.run(StandardIDPool.java:171)
attempt_201309101345_0012_m_000140_0: Caused by: com.thinkaurelius.titan.diskstorage.locking.TemporaryLockingException: Exceeded timeout count [4] when attempting to allocate id block
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.diskstorage.idmanagement.ConsistentKeyIDManager.getIDBlock(ConsistentKeyIDManager.java:191)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.renewBuffer(StandardIDPool.java:110)
attempt_201309101345_0012_m_000140_0: ... 2 more
I went to go investigate the StandardIDPool in Titan and I'm trying to figure out why the ID allocation process was done that way and not just automatically assigned something like a UUID instead? It seems like the current implementation is causing an inconsistent deadlock when working with larger graphs (10 million nodes +). My cluster is 24 nodes, with 576 mappers. On the second of the two MapReduce jobs that Faunus compiles down into, the mapper (MapSequence[com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Map, com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Reduce]) runs all 576 mappers up to 100% and then they sit waiting for them to negotiate their ID pools and never complete. A few get through to successfully complete, but most timeout.
I've read the pointers here: https://github.com/thinkaurelius/titan/wiki/Bulk-Loading about tweaking the configuration options, but that has made little to no difference in the outcome. I then went and changed the timeout in my mapred-site to be higher than the default 600 seconds, which helped, but unless the timeouts were ridiculously high, it would just die.
So I guess my questions are as follows:
The text was updated successfully, but these errors were encountered: