Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Titan ID allocation more robust for parallel bulk ingestion #382

Closed
mbroecheler opened this issue Sep 25, 2013 · 8 comments
Closed
Assignees
Milestone

Comments

@mbroecheler
Copy link
Member

I'm running an import of both sequencefiles and graphson files using Faunus 0.3.2 into HBase using the TitanHBaseOutputFormat format.

The graphs are already stored in HDFS on the same cluster as my HBase cluster. What I'm running into are errors like the following:

attempt_201309101345_0012_m_000140_0: Exception in thread "Thread-15" com.thinkaurelius.titan.core.TitanException: Could not acquire new ID block from storage
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.renewBuffer(StandardIDPool.java:116)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.access$100(StandardIDPool.java:14)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool$IDBlockThread.run(StandardIDPool.java:171)
attempt_201309101345_0012_m_000140_0: Caused by: com.thinkaurelius.titan.diskstorage.locking.TemporaryLockingException: Exceeded timeout count [4] when attempting to allocate id block
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.diskstorage.idmanagement.ConsistentKeyIDManager.getIDBlock(ConsistentKeyIDManager.java:191)
attempt_201309101345_0012_m_000140_0: at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.renewBuffer(StandardIDPool.java:110)
attempt_201309101345_0012_m_000140_0: ... 2 more

I went to go investigate the StandardIDPool in Titan and I'm trying to figure out why the ID allocation process was done that way and not just automatically assigned something like a UUID instead? It seems like the current implementation is causing an inconsistent deadlock when working with larger graphs (10 million nodes +). My cluster is 24 nodes, with 576 mappers. On the second of the two MapReduce jobs that Faunus compiles down into, the mapper (MapSequence[com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Map, com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Reduce]) runs all 576 mappers up to 100% and then they sit waiting for them to negotiate their ID pools and never complete. A few get through to successfully complete, but most timeout.

I've read the pointers here: https://github.com/thinkaurelius/titan/wiki/Bulk-Loading about tweaking the configuration options, but that has made little to no difference in the outcome. I then went and changed the timeout in my mapred-site to be higher than the default 600 seconds, which helped, but unless the timeouts were ridiculously high, it would just die.

So I guess my questions are as follows:

  1. Is there any reason that the ID allocation process shouldn't/couldn't be UUIDs rather than the way they are currently implemented?
  2. Is there something I'm missing in the configuration that will make this behave more nicely?
  3. Is this something that others are seeing with larger graphs, I assume this would happen regardless of whether it was HBase or Cassandra.
@ghost ghost assigned mbroecheler Sep 25, 2013
@cglewis
Copy link

cglewis commented Sep 25, 2013

thanks for creating this issue @mbroecheler

feel free to ping me if anyone would like further details about this issue I'm running into.

@jschott780
Copy link

We've seen the same behavior with Titan 0.3.1, hand rolled Map/Reduce, and Cassandra. Our graph is ~15 billion vertices and expected to grow to 50 billion very soon. In addition, our graph data tends to churn a couple times a week. Resulting in lots of ETL. So this is somewhat a critical issue for us.

Using UUIDs would solve a couple problems we expect to encounter with IDs. We already have UUIDs for our graph, and its causing some additional book keeping to sync between our external UUID data source and Titan's auto-assigned ID. I also suspect you could get a significant performance boost if when storage.batch-loading = true, allow the end user to specify their own UUIDs (ie the blueprint API). Self assigned UUIDs would be awesome for us.

@mbroecheler
Copy link
Member Author

This ticket is about making the ID assignment more robust to high concurrency. Whether Titan should allow UUIDs is a separate issue which I moved to ticket #383.

@jschott780
Copy link

I should have been a clearer in my response. Apologies for that.

We are seeing the same timeout issues related to ID allocation. We were running ~1200-2300 mappers to ingest vertices across 86 hosts. It appeared to overwhelm Cassandra as well as Titan. We scaled it back to ~500 mappers with no luck.

@mbroecheler
Copy link
Member Author

Exactly, we will fix that.

On Wed, Sep 25, 2013 at 10:32 AM, jschott780 notifications@github.comwrote:

I should have been a clearer in my response. Apologies for that.

We are seeing the same timeout issues related to ID allocation. We were
running ~1200-2300 mappers to ingest vertices across 86 hosts. It appeared
to overwhelm Cassandra as well as Titan. We scaled it back to ~500 mappers
with no luck.


Reply to this email directly or view it on GitHubhttps://github.com//issues/382#issuecomment-25107843
.

Matthias Broecheler
http://www.matthiasb.com

@kdatta
Copy link

kdatta commented Sep 28, 2013

Matthias, I see similar exceptions while bulk loading vertices to Titan+HBase. Is this open issue resolved?

@cglewis
Copy link

cglewis commented Oct 25, 2013

+1

@mbroecheler
Copy link
Member Author

Sorry, my bad, I used the wrong issue number in my last commit. This issue has not yet been closed. We are actively working on this and you should expect a resolution soon. Just not yet ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants