New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading millions of edges at once on a single vertex causes timeouts in Cassandra #11

Open
mbroecheler opened this Issue Jun 6, 2012 · 22 comments

Comments

Projects
None yet
6 participants
@mbroecheler
Member

mbroecheler commented Jun 6, 2012

When loading a million edges which are all incident on the same vertex into a TitanGraph backed by a Cassandra cluster with more than one node, a Cassandra internal timeout occurs which aborts the loading process.
This behavior is specific to having such a "supernode" with a lot of incident edges and loading all of these edges at once. Also, whether or not this behavior is observed depends on the hardware. For some systems, increasing the RPC timeout parameter solves the issue on others it does not.

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Jun 14, 2012

Member

This limitation also applies to dense index entries, i.e. if one is loading millions of properties with the same indexed value, then that creates a dense list of entries under that index entry. In these cases, failure in the storage backend may occur.

Member

mbroecheler commented Jun 14, 2012

This limitation also applies to dense index entries, i.e. if one is loading millions of properties with the same indexed value, then that creates a dense list of entries under that index entry. In these cases, failure in the storage backend may occur.

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 21, 2013

Hello Matthias,

I think I've observed this :
I currently trying to code a logs collector in order to inject event in a Titan Graph backed by Cassandra (1.8).
For one vertex (I.E. on server or one firewall equipment) we can add millions event (especially on firewalls).

For initials tests, I've try to parse and insert some log files (about 5 millions events by file).
Each log file concern just one equipment, So, loading a file consist to create a "Server" edge (if not exist) and link it "Event" edges (millions of them) in one transaction.

After events loading loop, I call :
g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);
g.shutdown();

and obtain :

120883 [main] DEBUG com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Saving transaction. Added 6000035, removed 0
127036 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
127689 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
135757 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
....
144237 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Connection reset
...

145648 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
146081 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
146563 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
146998 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
147586 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
148008 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
...
etc

and g.stopTransaction() seem never finish.

I have searching for an elegant way to save transactions by block without stoping it without success (a kind of g.saveTransaction() ).

Any idea ?

protheusfr commented Feb 21, 2013

Hello Matthias,

I think I've observed this :
I currently trying to code a logs collector in order to inject event in a Titan Graph backed by Cassandra (1.8).
For one vertex (I.E. on server or one firewall equipment) we can add millions event (especially on firewalls).

For initials tests, I've try to parse and insert some log files (about 5 millions events by file).
Each log file concern just one equipment, So, loading a file consist to create a "Server" edge (if not exist) and link it "Event" edges (millions of them) in one transaction.

After events loading loop, I call :
g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);
g.shutdown();

and obtain :

120883 [main] DEBUG com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Saving transaction. Added 6000035, removed 0
127036 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
127689 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
135757 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
....
144237 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Connection reset
...

145648 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
146081 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
146563 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
146998 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
147586 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
148008 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe
...
etc

and g.stopTransaction() seem never finish.

I have searching for an elegant way to save transactions by block without stoping it without success (a kind of g.saveTransaction() ).

Any idea ?

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Feb 21, 2013

Member

Hey,

I am not sure exactly what you mean by g.saveTransaction()? I think the
best way is to divide your transaction into smaller chunks. I.e. load only
a few thousand edges per transaction. That way, you are much less likely to
encounter the kind of timeout and buffer exceptions that you are getting.

HTH,
Matthias

On Thu, Feb 21, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Hello Matthias,

I think I've observed this :
I currently trying to code a logs collector in order to inject event in a
Titan Graph backed by Cassandra (1.8).
For one vertex (I.E. on server or une firewall equipment) we can add
millions event (especially on firewalls).

For initials tests, I've try to parse and insert some log files (about 5
millions events by file).
Each log file concern just one equipment, so loading a file consist to
create "Server" edge (if not exist) and link "Event" edges (millions of
them) a one transaction.

After events loading loop, I call :
g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);
g.shutdown();

and obtain :

120883 [main] DEBUG
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Saving
transaction. Added 6000035, removed 0
127036 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
127689 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
135757 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
....
144237 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Connection reset
...

145648 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
146081 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
146563 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
146998 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
147586 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
148008 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
...
etc

and g.stopTransaction() seem never finish.

I have searching for an elegant way to save transactions by block without
stoping it without success (a kind of g.saveTransaction() ).

Any idea ?


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13904023.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented Feb 21, 2013

Hey,

I am not sure exactly what you mean by g.saveTransaction()? I think the
best way is to divide your transaction into smaller chunks. I.e. load only
a few thousand edges per transaction. That way, you are much less likely to
encounter the kind of timeout and buffer exceptions that you are getting.

HTH,
Matthias

On Thu, Feb 21, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Hello Matthias,

I think I've observed this :
I currently trying to code a logs collector in order to inject event in a
Titan Graph backed by Cassandra (1.8).
For one vertex (I.E. on server or une firewall equipment) we can add
millions event (especially on firewalls).

For initials tests, I've try to parse and insert some log files (about 5
millions events by file).
Each log file concern just one equipment, so loading a file consist to
create "Server" edge (if not exist) and link "Event" edges (millions of
them) a one transaction.

After events loading loop, I call :
g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);
g.shutdown();

and obtain :

120883 [main] DEBUG
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Saving
transaction. Added 6000035, removed 0
127036 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
127689 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
135757 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
....
144237 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Connection reset
...

145648 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
146081 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
146563 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
146998 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
147586 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
148008 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter -
java.net.SocketException: Broken pipe
...
etc

and g.stopTransaction() seem never finish.

I have searching for an elegant way to save transactions by block without
stoping it without success (a kind of g.saveTransaction() ).

Any idea ?


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13904023.

Matthias Broecheler
http://www.matthiasb.com

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 21, 2013

"I am not sure exactly what you mean by g.saveTransaction()? "
My understanding of Cassandra backend implementation (only based on network traffic observation during transaction between client and cassandra servers) is :

  • When we add Vertex or Edge on TitanGraph object, they build the graph on memory,
  • The TitanGraph object only check/reserve IDs availability on Keyspace,
  • On call of .stopTransaction() this memory mapped graph is flushed on Cassandra Keyspace.

This may be a misunderstanding of my part, but if not, it could be interesting to allow developer to decide when datas must be "flushed" to Cassandra Keyspace in order to avoid this kind of buffer timeout.

Feel free to correct me my vision of Cassandra's TitanGraph implementation is incorrect.

P.S. I use "storage.batch-loading" option

protheusfr commented Feb 21, 2013

"I am not sure exactly what you mean by g.saveTransaction()? "
My understanding of Cassandra backend implementation (only based on network traffic observation during transaction between client and cassandra servers) is :

  • When we add Vertex or Edge on TitanGraph object, they build the graph on memory,
  • The TitanGraph object only check/reserve IDs availability on Keyspace,
  • On call of .stopTransaction() this memory mapped graph is flushed on Cassandra Keyspace.

This may be a misunderstanding of my part, but if not, it could be interesting to allow developer to decide when datas must be "flushed" to Cassandra Keyspace in order to avoid this kind of buffer timeout.

Feel free to correct me my vision of Cassandra's TitanGraph implementation is incorrect.

P.S. I use "storage.batch-loading" option

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Feb 22, 2013

Member

Yes, your understanding of how Titan operates is correct. However, only the
modifications of the memory mapped graph get flushed back to Cassandra.

Wouldn't the developer always want to flush all data? Otherwise they would
loose changes in their transaction.

On Thu, Feb 21, 2013 at 1:45 PM, protheusfr notifications@github.comwrote:

"I am not sure exactly what you mean by g.saveTransaction()? "
My understanding of Cassandra backend implementation (only based on
network traffic during transaction between client and cassandra servers) is
:

  • When we add Vertex or Edge on TitanGraph object, they build the
    graph on memory,
  • The TitanGraph object only check/reserve IDs availability on
    Keyspace,
  • On call of .stopTransaction() this memory mapped graph is flushed on
    Cassandra Keyspace.

This may be a misunderstanding of my part, but if not, it could be
interesting to allow developer to decide when datas must be "flushed" to
Cassandra Keyspace in order to avoid this kind of buffer timeout.

Feel free to correct me my vision of Cassandra's TitanGraph implementation
is incorrect.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13914852.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented Feb 22, 2013

Yes, your understanding of how Titan operates is correct. However, only the
modifications of the memory mapped graph get flushed back to Cassandra.

Wouldn't the developer always want to flush all data? Otherwise they would
loose changes in their transaction.

On Thu, Feb 21, 2013 at 1:45 PM, protheusfr notifications@github.comwrote:

"I am not sure exactly what you mean by g.saveTransaction()? "
My understanding of Cassandra backend implementation (only based on
network traffic during transaction between client and cassandra servers) is
:

  • When we add Vertex or Edge on TitanGraph object, they build the
    graph on memory,
  • The TitanGraph object only check/reserve IDs availability on
    Keyspace,
  • On call of .stopTransaction() this memory mapped graph is flushed on
    Cassandra Keyspace.

This may be a misunderstanding of my part, but if not, it could be
interesting to allow developer to decide when datas must be "flushed" to
Cassandra Keyspace in order to avoid this kind of buffer timeout.

Feel free to correct me my vision of Cassandra's TitanGraph implementation
is incorrect.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13914852.

Matthias Broecheler
http://www.matthiasb.com

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 22, 2013

Yes they would be flush, I've solve this pb with g.saveTransaction(), g.startTransaction() each 100 000 "Event" edge inserts during loading process.
So, now, my situation is :

  • I have one Vertex (my supernode) with about 1.2 millions connected Vertex,
  • This is backed by Cassandra with 7 nodes (dispatched on two geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO impl.ConnectionPoolMBeanManager: Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:cnode03.prod.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ? Gremlin doesn't seem log anything.

protheusfr commented Feb 22, 2013

Yes they would be flush, I've solve this pb with g.saveTransaction(), g.startTransaction() each 100 000 "Event" edge inserts during loading process.
So, now, my situation is :

  • I have one Vertex (my supernode) with about 1.2 millions connected Vertex,
  • This is backed by Cassandra with 7 nodes (dispatched on two geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO impl.ConnectionPoolMBeanManager: Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:cnode03.prod.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ? Gremlin doesn't seem log anything.

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Feb 25, 2013

Member

Meaning, you get no additional output? It just hangs forever?

On Fri, Feb 22, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Yes they would be flush, I've solve this pb with g.saveTransaction(),
g.startTransaction() each 100 000 "Event" edge inserts during loading
process.
So, now, my situation is :

  • I have one Vertex (my supernode) with about 1.2 millions connected
    Vertex,
  • This is backed by Cassandra with 7 nodes (dispatched on two
    geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO
impl.ConnectionPoolMBeanManager: Registering mbean:
com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:cnode03.prod.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com
').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ?
Gremlin doesn't seem log anything.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13960552.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented Feb 25, 2013

Meaning, you get no additional output? It just hangs forever?

On Fri, Feb 22, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Yes they would be flush, I've solve this pb with g.saveTransaction(),
g.startTransaction() each 100 000 "Event" edge inserts during loading
process.
So, now, my situation is :

  • I have one Vertex (my supernode) with about 1.2 millions connected
    Vertex,
  • This is backed by Cassandra with 7 nodes (dispatched on two
    geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO
impl.ConnectionPoolMBeanManager: Registering mbean:
com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:cnode03.prod.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com
').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ?
Gremlin doesn't seem log anything.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13960552.

Matthias Broecheler
http://www.matthiasb.com

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 25, 2013

Absolutely, no additional log on gremlin client and no relevant log on Cassandra servers.

Le 25 févr. 2013 à 11:14, Matthias Broecheler notifications@github.com a écrit :

Meaning, you get no additional output? It just hangs forever?

On Fri, Feb 22, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Yes they would be flush, I've solve this pb with g.saveTransaction(),
g.startTransaction() each 100 000 "Event" edge inserts during loading
process.
So, now, my situation is :

  • I have one Vertex (my supernode) with about 1.2 millions connected
    Vertex,
  • This is backed by Cassandra with 7 nodes (dispatched on two
    geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO
impl.ConnectionPoolMBeanManager: Registering mbean:
com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:cnode03.prod.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com
').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ?
Gremlin doesn't seem log anything.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13960552.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.

protheusfr commented Feb 25, 2013

Absolutely, no additional log on gremlin client and no relevant log on Cassandra servers.

Le 25 févr. 2013 à 11:14, Matthias Broecheler notifications@github.com a écrit :

Meaning, you get no additional output? It just hangs forever?

On Fri, Feb 22, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Yes they would be flush, I've solve this pb with g.saveTransaction(),
g.startTransaction() each 100 000 "Event" edge inserts during loading
process.
So, now, my situation is :

  • I have one Vertex (my supernode) with about 1.2 millions connected
    Vertex,
  • This is backed by Cassandra with 7 nodes (dispatched on two
    geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO
impl.ConnectionPoolMBeanManager: Registering mbean:
com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:cnode03.prod.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com
').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ?
Gremlin doesn't seem log anything.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13960552.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 25, 2013

I've take a look to network traffic between Gremlin client and Cassandra servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......10.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2
16:13:51.779754 IP cnode03.prod.dc1xxxxxx.com.netlock1 > 10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options [nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......
6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...
16:13:51.779813 IP 10.11.1.242.62103 > cnode03.prod.dc1.xxxxxxxx.com.netlock1: Flags [.], ack 18356, win 8050, options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxxx.com.netlock1 > 10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options [nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........

protheusfr commented Feb 25, 2013

I've take a look to network traffic between Gremlin client and Cassandra servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......10.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2
16:13:51.779754 IP cnode03.prod.dc1xxxxxx.com.netlock1 > 10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options [nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......
6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...
16:13:51.779813 IP 10.11.1.242.62103 > cnode03.prod.dc1.xxxxxxxx.com.netlock1: Flags [.], ack 18356, win 8050, options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxxx.com.netlock1 > 10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options [nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Feb 26, 2013

Member

Can you run simple queries or do all queries time out? Try changing to
cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra
servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1
0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2
16:13:51.779754 IP cnode03.prod.dc1.xgs-france.com.netlock1 >
10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options
[nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...
16:13:51.779813 IP 10.11.1.242.62103 >
cnode03.prod.dc1.xgs-france.com.netlock1: Flags [.], ack 18356, win 8050,
options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xgs-france.com.netlock1 >
10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options
[nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented Feb 26, 2013

Can you run simple queries or do all queries time out? Try changing to
cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra
servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1
0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2
16:13:51.779754 IP cnode03.prod.dc1.xgs-france.com.netlock1 >
10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options
[nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...
16:13:51.779813 IP 10.11.1.242.62103 >
cnode03.prod.dc1.xgs-france.com.netlock1: Flags [.], ack 18356, win 8050,
options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xgs-france.com.netlock1 >
10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options
[nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794.

Matthias Broecheler
http://www.matthiasb.com

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 26, 2013

Hello,

Yes single simple query work fine :

gremlin> g = TitanFactory.open("./cassandra.distant");
==>titangraph[cassandrathrift:cnode03.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').map
==>{timestamp=1361551746015, IP=/0.0.0.10, name=srx3600.interco.dc1.xxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}

After changing to cassandra thrift :
gremlin> g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()
Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:189)
at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.containsVertex(StandardPersistTitanTx.java:78)
at com.thinkaurelius.titan.graphdb.types.manager.SimpleTypeManager.getType(SimpleTypeManager.java:137)
at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExisting(AbstractTitanTx.java:166)
at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExistingVertex(AbstractTitanTx.java:156)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:401)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:324)
at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.loadRelations(StandardPersistTitanTx.java:155)
at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.ensureLoadedEdges(AbstractTitanVertex.java:88)
at com.thinkaurelius.titan.graphdb.vertices.StandardTitanVertex.getRelations(StandardTitanVertex.java:68)
at com.thinkaurelius.titan.graphdb.query.SimpleAtomicQuery.edges(SimpleAtomicQuery.java:473)
at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.getEdges(AbstractTitanVertex.java:177)
at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:46)
at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:16)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:84)
at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:108)
at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1080)
at com.tinkerpop.pipes.util.PipesFluentPipeline$count.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
at groovysh_evaluate.run(groovysh_evaluate:46)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67)
at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:60)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:67)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.main(Console.java:72)
Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException: Permanent failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:255)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:169)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.containsKey(BufferedKeyColumnValueStore.java:31)
at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.containsKey(ConsistentKeyLockStore.java:77)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:184)
... 64 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552)
at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:166)
... 67 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 78 more

I think we can exclude a Cassandra failure, this Cassandra cluster work fine with other applications (keys / values through Hector).

Le 26 févr. 2013 à 08:13, Matthias Broecheler notifications@github.com a écrit :

Can you run simple queries or do all queries time out? Try changing to
cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra
servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1
0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2
16:13:51.779754 IP cnode03.prod.dc1.xxxxxxx.netlock1 >
10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options
[nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...
16:13:51.779813 IP 10.11.1.242.62103 >
cnode03.prod.dc1.xxxxxx.com.netlock1: Flags [.], ack 18356, win 8050,
options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxx.com.netlock1 >
10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options
[nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.

protheusfr commented Feb 26, 2013

Hello,

Yes single simple query work fine :

gremlin> g = TitanFactory.open("./cassandra.distant");
==>titangraph[cassandrathrift:cnode03.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').map
==>{timestamp=1361551746015, IP=/0.0.0.10, name=srx3600.interco.dc1.xxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}

After changing to cassandra thrift :
gremlin> g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()
Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:189)
at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.containsVertex(StandardPersistTitanTx.java:78)
at com.thinkaurelius.titan.graphdb.types.manager.SimpleTypeManager.getType(SimpleTypeManager.java:137)
at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExisting(AbstractTitanTx.java:166)
at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExistingVertex(AbstractTitanTx.java:156)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:401)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:324)
at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.loadRelations(StandardPersistTitanTx.java:155)
at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.ensureLoadedEdges(AbstractTitanVertex.java:88)
at com.thinkaurelius.titan.graphdb.vertices.StandardTitanVertex.getRelations(StandardTitanVertex.java:68)
at com.thinkaurelius.titan.graphdb.query.SimpleAtomicQuery.edges(SimpleAtomicQuery.java:473)
at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.getEdges(AbstractTitanVertex.java:177)
at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:46)
at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:16)
at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:84)
at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:108)
at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1080)
at com.tinkerpop.pipes.util.PipesFluentPipeline$count.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
at groovysh_evaluate.run(groovysh_evaluate:46)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67)
at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:60)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:67)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.main(Console.java:72)
Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException: Permanent failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:255)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:169)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.containsKey(BufferedKeyColumnValueStore.java:31)
at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.containsKey(ConsistentKeyLockStore.java:77)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:184)
... 64 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552)
at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:166)
... 67 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 78 more

I think we can exclude a Cassandra failure, this Cassandra cluster work fine with other applications (keys / values through Hector).

Le 26 févr. 2013 à 08:13, Matthias Broecheler notifications@github.com a écrit :

Can you run simple queries or do all queries time out? Try changing to
cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra
servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1
0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2
16:13:51.779754 IP cnode03.prod.dc1.xxxxxxx.netlock1 >
10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options
[nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...
16:13:51.779813 IP 10.11.1.242.62103 >
cnode03.prod.dc1.xxxxxx.com.netlock1: Flags [.], ack 18356, win 8050,
options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxx.com.netlock1 >
10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options
[nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Feb 27, 2013

Member

Hey,
this is strange. It seems to time out on a very simple cassandra operation

  • checking the existing of a small key slice. Can you reliably execute all
    other queries but this one always times out? Meaning, is it
    deterministically reproducible?
    Thank you,
    Matthias

On Tue, Feb 26, 2013 at 12:55 AM, protheusfr notifications@github.comwrote:

Hello,

Yes single simple query work fine :

gremlin> g = TitanFactory.open("./cassandra.distant");
==>titangraph[cassandrathrift:cnode03.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').map
==>{timestamp=1361551746015, IP=/0.0.0.10, name=
srx3600.interco.dc1.xxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}

After changing to cassandra thrift :
gremlin> g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()

Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:189)

at
com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.containsVertex(StandardPersistTitanTx.java:78)

at
com.thinkaurelius.titan.graphdb.types.manager.SimpleTypeManager.getType(SimpleTypeManager.java:137)

at
com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExisting(AbstractTitanTx.java:166)

at
com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExistingVertex(AbstractTitanTx.java:156)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:401)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:324)

at
com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.loadRelations(StandardPersistTitanTx.java:155)

at
com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.ensureLoadedEdges(AbstractTitanVertex.java:88)

at
com.thinkaurelius.titan.graphdb.vertices.StandardTitanVertex.getRelations(StandardTitanVertex.java:68)

at
com.thinkaurelius.titan.graphdb.query.SimpleAtomicQuery.edges(SimpleAtomicQuery.java:473)

at
com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.getEdges(AbstractTitanVertex.java:177)

at
com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:46)

at
com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:16)

at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:84)
at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:108)
at
com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1080)
at com.tinkerpop.pipes.util.PipesFluentPipeline$count.call(Unknown Source)
at
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)

at
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)

at
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)

at groovysh_evaluate.run(groovysh_evaluate:46)
at groovysh_evaluate$run.call(Unknown Source)
at
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)

at groovysh_evaluate$run.call(Unknown Source)
at
org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67)
at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown
Source)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88)
at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)

at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)

at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)

at
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)

at
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57)
at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)

at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)

at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)

at
com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:60)
at
com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:67)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.main(Console.java:72)
Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException:
Permanent failure in storage backend
at
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:255)

at
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:169)

at
com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.containsKey(BufferedKeyColumnValueStore.java:31)

at
com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.containsKey(ConsistentKeyLockStore.java:77)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:184)

... 64 more
Caused by: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)

at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552)

at
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536)
at
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:166)

... 67 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

... 78 more

I think we can exclude a Cassandra failure, this Cassandra cluster work
fine with other applications (keys / values through Hector).

Le 26 févr. 2013 à 08:13, Matthias Broecheler notifications@github.com
a écrit :

Can you run simple queries or do all queries time out? Try changing to
cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and
Cassandra
servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1

0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2

16:13:51.779754 IP cnode03.prod.dc1.xxxxxxx.netlock1 >
10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114,
options
[nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...

16:13:51.779813 IP 10.11.1.242.62103 >
cnode03.prod.dc1.xxxxxx.com.netlock1: Flags [.], ack 18356, win 8050,
options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxx.com.netlock1 >
10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options
[nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........


Reply to this email directly or view it on GitHub<
https://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794>.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14101638
.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented Feb 27, 2013

Hey,
this is strange. It seems to time out on a very simple cassandra operation

  • checking the existing of a small key slice. Can you reliably execute all
    other queries but this one always times out? Meaning, is it
    deterministically reproducible?
    Thank you,
    Matthias

On Tue, Feb 26, 2013 at 12:55 AM, protheusfr notifications@github.comwrote:

Hello,

Yes single simple query work fine :

gremlin> g = TitanFactory.open("./cassandra.distant");
==>titangraph[cassandrathrift:cnode03.xxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').map
==>{timestamp=1361551746015, IP=/0.0.0.10, name=
srx3600.interco.dc1.xxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}

After changing to cassandra thrift :
gremlin> g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()

Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:189)

at
com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.containsVertex(StandardPersistTitanTx.java:78)

at
com.thinkaurelius.titan.graphdb.types.manager.SimpleTypeManager.getType(SimpleTypeManager.java:137)

at
com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExisting(AbstractTitanTx.java:166)

at
com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExistingVertex(AbstractTitanTx.java:156)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:401)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:324)

at
com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.loadRelations(StandardPersistTitanTx.java:155)

at
com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.ensureLoadedEdges(AbstractTitanVertex.java:88)

at
com.thinkaurelius.titan.graphdb.vertices.StandardTitanVertex.getRelations(StandardTitanVertex.java:68)

at
com.thinkaurelius.titan.graphdb.query.SimpleAtomicQuery.edges(SimpleAtomicQuery.java:473)

at
com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.getEdges(AbstractTitanVertex.java:177)

at
com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:46)

at
com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:16)

at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:84)
at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115)
at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:108)
at
com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1080)
at com.tinkerpop.pipes.util.PipesFluentPipeline$count.call(Unknown Source)
at
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)

at
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)

at
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)

at groovysh_evaluate.run(groovysh_evaluate:46)
at groovysh_evaluate$run.call(Unknown Source)
at
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)

at groovysh_evaluate$run.call(Unknown Source)
at
org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67)
at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown
Source)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88)
at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)

at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)

at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)

at
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)

at
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57)
at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071)
at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)

at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)

at
org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)

at
com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:60)
at
com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:67)
at com.thinkaurelius.titan.tinkerpop.gremlin.Console.main(Console.java:72)
Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException:
Permanent failure in storage backend
at
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:255)

at
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:169)

at
com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.containsKey(BufferedKeyColumnValueStore.java:31)

at
com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.containsKey(ConsistentKeyLockStore.java:77)

at
com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:184)

... 64 more
Caused by: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)

at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552)

at
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536)
at
com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:166)

... 67 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

... 78 more

I think we can exclude a Cassandra failure, this Cassandra cluster work
fine with other applications (keys / values through Hector).

Le 26 févr. 2013 à 08:13, Matthias Broecheler notifications@github.com
a écrit :

Can you run simple queries or do all queries time out? Try changing to
cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and
Cassandra
servers.
It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1

0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2

16:13:51.779754 IP cnode03.prod.dc1.xxxxxxx.netlock1 >
10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114,
options
[nop,nop,TS val 907272194 ecr 430479826], length 821
E..iv.@.?.A.

k.
...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...

16:13:51.779813 IP 10.11.1.242.62103 >
cnode03.prod.dc1.xxxxxx.com.netlock1: Flags [.], ack 18356, win 8050,
options [nop,nop,TS val 430479829 ecr 907272194], length 0
E..4.c@.@..X
...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxx.com.netlock1 >
10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options
[nop,nop,TS val 1162997618 ecr 429222099], length 30
E..R..@.=...
.e.
...#.... ...Q ....r&......
EQ.r..h............ get_slice.........


Reply to this email directly or view it on GitHub<
https://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794>.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14101638
.

Matthias Broecheler
http://www.matthiasb.com

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 27, 2013

Yes absolutely, every request like "g.V('name','srx3600.interco.dc1.xxxxxx.com').map" work fine.
But when I try to access to a connected vertex of this one (they have more than 1 million connected vertex to this "super-node") this fails systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()" with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work fine (quite long, but provide a response).

protheusfr commented Feb 27, 2013

Yes absolutely, every request like "g.V('name','srx3600.interco.dc1.xxxxxx.com').map" work fine.
But when I try to access to a connected vertex of this one (they have more than 1 million connected vertex to this "super-node") this fails systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()" with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work fine (quite long, but provide a response).

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler Feb 27, 2013

Member

Do you have vertex centric indices on these super node vertices? That would
allow you to pull out the edges you want quickly. Otherwise, your queries
might attempt to read all edges which - at 1 million edges - is likely to
time out because its too much data. Also, try to limit the size of the
result set using [0..1000] after the outE. Does that work?

On Wed, Feb 27, 2013 at 5:22 AM, protheusfr notifications@github.comwrote:

Yes absolutely, every request like "g.V('name','
srx3600.interco.dc1.xxxxxx.com').map" work fine.
But when I try to access to a connected vertex of this one (they have more
than 1 million connected vertex to this "super-node") this fails
systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()"
with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work
fine (quite long, but provide a response).


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14173194
.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented Feb 27, 2013

Do you have vertex centric indices on these super node vertices? That would
allow you to pull out the edges you want quickly. Otherwise, your queries
might attempt to read all edges which - at 1 million edges - is likely to
time out because its too much data. Also, try to limit the size of the
result set using [0..1000] after the outE. Does that work?

On Wed, Feb 27, 2013 at 5:22 AM, protheusfr notifications@github.comwrote:

Yes absolutely, every request like "g.V('name','
srx3600.interco.dc1.xxxxxx.com').map" work fine.
But when I try to access to a connected vertex of this one (they have more
than 1 million connected vertex to this "super-node") this fails
systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()"
with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work
fine (quite long, but provide a response).


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14173194
.

Matthias Broecheler
http://www.matthiasb.com

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 27, 2013

What do you mean exactly by vertex centric indices ?

I have created some index on vertex and edge in order to avoid full scans :

    if (g.getType("time") == null || (g.getType("time") != null && !g.getType("time").isPropertyKey())) {
        TitanKey time = g.makeType().name("time").dataType(Long.class).functional().indexed().makePropertyKey();
        TitanLabel log = g.makeType().name("log").primaryKey(time).makeEdgeLabel();
        TitanLabel connected = g.makeType().name("connected").primaryKey(time).makeEdgeLabel();
        TitanLabel srx_create = g.makeType().name(SRX_CREATE).primaryKey(time).makeEdgeLabel();
        TitanLabel srx_close = g.makeType().name(SRX_CLOSE).primaryKey(time).makeEdgeLabel();
        TitanLabel srx_deny = g.makeType().name(SRX_DENY).primaryKey(time).makeEdgeLabel();

    }

    // Vertex index creation
    if (g.getType("name") == null || ((g.getType("name") != null) && !g.getType("name").isPropertyKey())) {
        TitanKey name = g.makeType().name("name").dataType(String.class).unique().functional().indexed().makePropertyKey();
    }
    if (g.getType("IP") == null || (g.getType("IP") != null && !g.getType("IP").isPropertyKey())) {
        TitanKey name = g.makeType().name("IP").dataType(String.class).functional().makePropertyKey();
    }
    if (g.getType("action") == null || (g.getType("action") != null && !g.getType("action").isPropertyKey())) {
        TitanKey name = g.makeType().name("action").dataType(String.class).functional().makePropertyKey();
    }

After that , I begin to create initial vertex (the super node) by calling :

public Vertex CreateSrx(String FQDN, InetAddress IP, String OS){
    Vertex v = g.addVertex(null);
    v.setProperty("name", FQDN);
    v.setProperty("date", new Date().toString());
    v.setProperty("timestamp", System.currentTimeMillis());
    v.setProperty("IP",IP.toString());
    return v;
}

And link events to this node through :

public Vertex CreateEvent(Vertex Server, String GivenLevel, Long TimeStamp, String Message, String Action ) throws IllegalArgumentException{
    if(!Level.contains(GivenLevel)) {
        throw new IllegalArgumentException("Given Level is not correct");
    }
    Vertex v = g.addVertex(null);
    v.setProperty("level", GivenLevel);
    v.setProperty("timestamp",TimeStamp);
    v.setProperty("message", Message);
    v.setProperty("action", Action);

    Edge edge = null;
    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, "srx_close");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, "srx_deny");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
    }
    return v;
}

For the test with range limit [0..10000], THIS IS VERY STRANGE :

First call to initial vertex, every think goes well :
==>titangraph[cassandrathrift:cnode03.prod.dc1.xxxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com').map

First call to edges :

==>{timestamp=1361551746015, IP=/0.0.0.10, name=srx3600.interco.dc1.xxxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1000]
Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)
… (same exception than previously explained)

After a while (about 1 minute), I call again with [0..1] :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1]
==>e[479:4:36028797018964102][4-srx_deny->128]
==>e[1307:4:36028797018964102][4-srx_deny->404]

Ok… so trying to count :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..100].count()
==>101

Well, we try with a large range :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1000].count()
==>1001

Large again (same range previously throw an exception) :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..10000].count()
==>10001

:-/ …

Large again :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..100000].count()
==>62624

!!

Do you understand something ?

Le 27 févr. 2013 à 20:08, Matthias Broecheler notifications@github.com a écrit :

Do you have vertex centric indices on these super node vertices? That would
allow you to pull out the edges you want quickly. Otherwise, your queries
might attempt to read all edges which - at 1 million edges - is likely to
time out because its too much data. Also, try to limit the size of the
result set using [0..1000] after the outE. Does that work?

On Wed, Feb 27, 2013 at 5:22 AM, protheusfr notifications@github.comwrote:

Yes absolutely, every request like "g.V('name','
srx3600.interco.dc1.xxxxxx.com').map" work fine.
But when I try to access to a connected vertex of this one (they have more
than 1 million connected vertex to this "super-node") this fails
systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()"
with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work
fine (quite long, but provide a response).


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14173194
.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.

protheusfr commented Feb 27, 2013

What do you mean exactly by vertex centric indices ?

I have created some index on vertex and edge in order to avoid full scans :

    if (g.getType("time") == null || (g.getType("time") != null && !g.getType("time").isPropertyKey())) {
        TitanKey time = g.makeType().name("time").dataType(Long.class).functional().indexed().makePropertyKey();
        TitanLabel log = g.makeType().name("log").primaryKey(time).makeEdgeLabel();
        TitanLabel connected = g.makeType().name("connected").primaryKey(time).makeEdgeLabel();
        TitanLabel srx_create = g.makeType().name(SRX_CREATE).primaryKey(time).makeEdgeLabel();
        TitanLabel srx_close = g.makeType().name(SRX_CLOSE).primaryKey(time).makeEdgeLabel();
        TitanLabel srx_deny = g.makeType().name(SRX_DENY).primaryKey(time).makeEdgeLabel();

    }

    // Vertex index creation
    if (g.getType("name") == null || ((g.getType("name") != null) && !g.getType("name").isPropertyKey())) {
        TitanKey name = g.makeType().name("name").dataType(String.class).unique().functional().indexed().makePropertyKey();
    }
    if (g.getType("IP") == null || (g.getType("IP") != null && !g.getType("IP").isPropertyKey())) {
        TitanKey name = g.makeType().name("IP").dataType(String.class).functional().makePropertyKey();
    }
    if (g.getType("action") == null || (g.getType("action") != null && !g.getType("action").isPropertyKey())) {
        TitanKey name = g.makeType().name("action").dataType(String.class).functional().makePropertyKey();
    }

After that , I begin to create initial vertex (the super node) by calling :

public Vertex CreateSrx(String FQDN, InetAddress IP, String OS){
    Vertex v = g.addVertex(null);
    v.setProperty("name", FQDN);
    v.setProperty("date", new Date().toString());
    v.setProperty("timestamp", System.currentTimeMillis());
    v.setProperty("IP",IP.toString());
    return v;
}

And link events to this node through :

public Vertex CreateEvent(Vertex Server, String GivenLevel, Long TimeStamp, String Message, String Action ) throws IllegalArgumentException{
    if(!Level.contains(GivenLevel)) {
        throw new IllegalArgumentException("Given Level is not correct");
    }
    Vertex v = g.addVertex(null);
    v.setProperty("level", GivenLevel);
    v.setProperty("timestamp",TimeStamp);
    v.setProperty("message", Message);
    v.setProperty("action", Action);

    Edge edge = null;
    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, "srx_close");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, "srx_deny");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
    }
    return v;
}

For the test with range limit [0..10000], THIS IS VERY STRANGE :

First call to initial vertex, every think goes well :
==>titangraph[cassandrathrift:cnode03.prod.dc1.xxxxxx.com]
gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com').map

First call to edges :

==>{timestamp=1361551746015, IP=/0.0.0.10, name=srx3600.interco.dc1.xxxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1000]
Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)
… (same exception than previously explained)

After a while (about 1 minute), I call again with [0..1] :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1]
==>e[479:4:36028797018964102][4-srx_deny->128]
==>e[1307:4:36028797018964102][4-srx_deny->404]

Ok… so trying to count :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..100].count()
==>101

Well, we try with a large range :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1000].count()
==>1001

Large again (same range previously throw an exception) :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..10000].count()
==>10001

:-/ …

Large again :
gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..100000].count()
==>62624

!!

Do you understand something ?

Le 27 févr. 2013 à 20:08, Matthias Broecheler notifications@github.com a écrit :

Do you have vertex centric indices on these super node vertices? That would
allow you to pull out the edges you want quickly. Otherwise, your queries
might attempt to read all edges which - at 1 million edges - is likely to
time out because its too much data. Also, try to limit the size of the
result set using [0..1000] after the outE. Does that work?

On Wed, Feb 27, 2013 at 5:22 AM, protheusfr notifications@github.comwrote:

Yes absolutely, every request like "g.V('name','
srx3600.interco.dc1.xxxxxx.com').map" work fine.
But when I try to access to a connected vertex of this one (they have more
than 1 million connected vertex to this "super-node") this fails
systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()"
with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work
fine (quite long, but provide a response).


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14173194
.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHub.

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Feb 27, 2013

Hmmm my mistake (even if it doesn't explain previous observed comportment) :
replaced :

    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, "srx_close");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, "srx_deny");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
    }

by

    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, SRX_CREATE);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, SRX_CLOSE);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, SRX_DENY);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "connected");
            edge.setProperty("time", System.currentTimeMillis());
    }

so, index is now realy used.

BUT I've found another pb :

gremlin> g.V('name','srx3600.interco.dc1.xxxxxxx.com').outE('RT_FLOW_SESSION_CREATE')[0..1]
Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
....

Caused by: org.apache.thrift.transport.TTransportException: Frame size (22818648) larger than max length (16384000)!
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

This is caused by "thrift_max_message_length_in_mb: 16" default option in cassandra.yaml.

The question is : why he try to get more than 16Mb off data to retrieve just two line of 30 bytes ?

protheusfr commented Feb 27, 2013

Hmmm my mistake (even if it doesn't explain previous observed comportment) :
replaced :

    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, "srx_close");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, "srx_deny");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
    }

by

    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, SRX_CREATE);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, SRX_CLOSE);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, SRX_DENY);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "connected");
            edge.setProperty("time", System.currentTimeMillis());
    }

so, index is now realy used.

BUT I've found another pb :

gremlin> g.V('name','srx3600.interco.dc1.xxxxxxx.com').outE('RT_FLOW_SESSION_CREATE')[0..1]
Could not read from storage
Display stack trace? [yN] y
com.thinkaurelius.titan.core.TitanException: Could not read from storage
....

Caused by: org.apache.thrift.transport.TTransportException: Frame size (22818648) larger than max length (16384000)!
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

This is caused by "thrift_max_message_length_in_mb: 16" default option in cassandra.yaml.

The question is : why he try to get more than 16Mb off data to retrieve just two line of 30 bytes ?

@topiaruss

This comment has been minimized.

Show comment
Hide comment
@topiaruss

topiaruss Apr 13, 2013

@protheusfr Was there a resolution to this?

topiaruss commented Apr 13, 2013

@protheusfr Was there a resolution to this?

@TimLudwinski

This comment has been minimized.

Show comment
Hide comment
@TimLudwinski

TimLudwinski May 19, 2014

I'm running into this issue in Titan 0.4.1 with a single cassandra instance. Here is a simple example that displays this issue on an empty server:

> v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
> v2 = g.addVertex(["key": ".com", "keytype": "tld"])
> for(i = 0; i < 800000; i++) {
    g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType": "tld"]);
    g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType": "domain"]);
    g.commit();  //Takes longer but ensures we don't run into any problems.  
}

> v1.key
==>google.com
> v2.bothE[0] //This works
> v2.outE[0] //This works
> v2.outE[500000] //This works but is incredibly slow
> v2.outE[550000] //This fails
> v2.inE[0] //This fails with no edge being returned.  
> v2.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production server that is having the same problem there is an index on timestamp. This really stinks because I can't access the incoming edges at all.

TimLudwinski commented May 19, 2014

I'm running into this issue in Titan 0.4.1 with a single cassandra instance. Here is a simple example that displays this issue on an empty server:

> v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
> v2 = g.addVertex(["key": ".com", "keytype": "tld"])
> for(i = 0; i < 800000; i++) {
    g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType": "tld"]);
    g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType": "domain"]);
    g.commit();  //Takes longer but ensures we don't run into any problems.  
}

> v1.key
==>google.com
> v2.bothE[0] //This works
> v2.outE[0] //This works
> v2.outE[500000] //This works but is incredibly slow
> v2.outE[550000] //This fails
> v2.inE[0] //This fails with no edge being returned.  
> v2.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production server that is having the same problem there is an index on timestamp. This really stinks because I can't access the incoming edges at all.

@mbroecheler

This comment has been minimized.

Show comment
Hide comment
@mbroecheler

mbroecheler May 19, 2014

Member

Yes, at over a million edges per vertex you are getting into "the danger
zone" in particular without a vertex centric index.
There isn't much we can do about it due to the limitations of Cassandra
that cause this.

However, we will introduce partitioned vertices in Titan 0.5 to address the
issue of extreme super nodes like the one you are encountering.

On Mon, May 19, 2014 at 11:37 AM, TimLudwinski notifications@github.comwrote:

I'm running into this issue in Titan 0.4.1 with a single cassandra
instance. Here is a simple example that displays this issue on an empty
server:

v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
v2 = g.addVertex(["key": "google.com", "keytype": "domain"])
for(i = 0; i < 800000; i++) {
g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType", "tld"]);
g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType", "domain"]);
g.commit(); //Takes longer but ensures we don't run into any problems.
}

v1.key
==>google.com
v1.bothE[0] //This works
v1.inE[0] //This works
v1.inE[500000] //This works but is incredibly slow
v1.inE[550000] //This fails
v1.outE[0] //This fails
v1.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production
server that is having the same problem there is an index on timestamp.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-43540270
.

Matthias Broecheler
http://www.matthiasb.com

Member

mbroecheler commented May 19, 2014

Yes, at over a million edges per vertex you are getting into "the danger
zone" in particular without a vertex centric index.
There isn't much we can do about it due to the limitations of Cassandra
that cause this.

However, we will introduce partitioned vertices in Titan 0.5 to address the
issue of extreme super nodes like the one you are encountering.

On Mon, May 19, 2014 at 11:37 AM, TimLudwinski notifications@github.comwrote:

I'm running into this issue in Titan 0.4.1 with a single cassandra
instance. Here is a simple example that displays this issue on an empty
server:

v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
v2 = g.addVertex(["key": "google.com", "keytype": "domain"])
for(i = 0; i < 800000; i++) {
g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType", "tld"]);
g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType", "domain"]);
g.commit(); //Takes longer but ensures we don't run into any problems.
}

v1.key
==>google.com
v1.bothE[0] //This works
v1.inE[0] //This works
v1.inE[500000] //This works but is incredibly slow
v1.inE[550000] //This fails
v1.outE[0] //This fails
v1.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production
server that is having the same problem there is an index on timestamp.


Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-43540270
.

Matthias Broecheler
http://www.matthiasb.com

@barnab5012

This comment has been minimized.

Show comment
Hide comment
@barnab5012

barnab5012 May 19, 2014

Bonne nouvelle pour titan 0.5
Le 19 mai 2014 22:51, "Matthias Broecheler" notifications@github.com a
écrit :

Yes, at over a million edges per vertex you are getting into "the danger
zone" in particular without a vertex centric index.
There isn't much we can do about it due to the limitations of Cassandra
that cause this.

However, we will introduce partitioned vertices in Titan 0.5 to address
the
issue of extreme super nodes like the one you are encountering.

On Mon, May 19, 2014 at 11:37 AM, TimLudwinski notifications@github.comwrote:

I'm running into this issue in Titan 0.4.1 with a single cassandra
instance. Here is a simple example that displays this issue on an empty
server:

v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
v2 = g.addVertex(["key": "google.com", "keytype": "domain"])
for(i = 0; i < 800000; i++) {
g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument":
"testdoc-123123123", "sourceDocumentType": "context", "timestamp": i,
"scope": "public", "inKeyType": "domain", "outKeyType", "tld"]);
g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument":
"testdoc-123123123", "sourceDocumentType": "context", "timestamp": i,
"scope": "public", "inKeyType": "tld", "outKeyType", "domain"]);
g.commit(); //Takes longer but ensures we don't run into any problems.
}

v1.key
==>google.com
v1.bothE[0] //This works
v1.inE[0] //This works
v1.inE[500000] //This works but is incredibly slow
v1.inE[550000] //This fails
v1.outE[0] //This fails
v1.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production
server that is having the same problem there is an index on timestamp.

Reply to this email directly or view it on GitHub<
https://github.com/thinkaurelius/titan/issues/11#issuecomment-43540270>
.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-43556353
.

barnab5012 commented May 19, 2014

Bonne nouvelle pour titan 0.5
Le 19 mai 2014 22:51, "Matthias Broecheler" notifications@github.com a
écrit :

Yes, at over a million edges per vertex you are getting into "the danger
zone" in particular without a vertex centric index.
There isn't much we can do about it due to the limitations of Cassandra
that cause this.

However, we will introduce partitioned vertices in Titan 0.5 to address
the
issue of extreme super nodes like the one you are encountering.

On Mon, May 19, 2014 at 11:37 AM, TimLudwinski notifications@github.comwrote:

I'm running into this issue in Titan 0.4.1 with a single cassandra
instance. Here is a simple example that displays this issue on an empty
server:

v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
v2 = g.addVertex(["key": "google.com", "keytype": "domain"])
for(i = 0; i < 800000; i++) {
g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument":
"testdoc-123123123", "sourceDocumentType": "context", "timestamp": i,
"scope": "public", "inKeyType": "domain", "outKeyType", "tld"]);
g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument":
"testdoc-123123123", "sourceDocumentType": "context", "timestamp": i,
"scope": "public", "inKeyType": "tld", "outKeyType", "domain"]);
g.commit(); //Takes longer but ensures we don't run into any problems.
}

v1.key
==>google.com
v1.bothE[0] //This works
v1.inE[0] //This works
v1.inE[500000] //This works but is incredibly slow
v1.inE[550000] //This fails
v1.outE[0] //This fails
v1.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production
server that is having the same problem there is an index on timestamp.

Reply to this email directly or view it on GitHub<
https://github.com/thinkaurelius/titan/issues/11#issuecomment-43540270>
.

Matthias Broecheler
http://www.matthiasb.com

Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-43556353
.

@anvie

This comment has been minimized.

Show comment
Hide comment
@anvie

anvie Aug 12, 2014

+1 for partitioned vertices

anvie commented Aug 12, 2014

+1 for partitioned vertices

@protheusfr

This comment has been minimized.

Show comment
Hide comment
@protheusfr

protheusfr Nov 24, 2014

Good news, I will try it.

protheusfr commented Nov 24, 2014

Good news, I will try it.

@mbroecheler mbroecheler modified the milestone: Backlog Jan 21, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment