Best way to load exported medium-sized graphs


carlos.bobed@...
 

Hi Marc, 

from the gremlin console, I get: 
.with(IO.reader, IO.graphml).read().iterate()] - Batch too large
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Batch too large
at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:184)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:110)
at org.apache.tinkerpop.gremlin.console.Console$_executeInShell_closure19.doCall(Console.groovy:419)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:405)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2246)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2226)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2276)
at org.codehaus.groovy.runtime.dgm$199.doMethodInvoke(Unknown Source)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.Console.executeInShell(Console.groovy:396)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:163)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502)


While getting in the server log, on the one side a lot of warnings such as: 
1692178 [JanusGraph Cluster-nio-worker-1] WARN com.datastax.driver.core.RequestHandler - Query '[20 stateme
nts, 80 bound values] BEGIN UNLOGGED BATCH INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:key,
:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:ke
y,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:
key,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES
(:key,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column... [truncated
output]' generated server side warning(s): Batch of prepared statements for [janusgraph.edgestore] is of siz
e 7626, exceeding specified threshold of 5120 by 2506.


And finally an exception about a temporary failure in storage backend: 
1692180 [gremlin-server-session-1] INFO org.janusgraph.diskstorage.util.BackendOperation - Temporary except
ion during backend operation [CacheMutation]. Attempting backoff retry.
42047 org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
42048 at io.vavr.API$Match$Case0.apply(API.java:3174)
42049 at io.vavr.API$Match.of(API.java:3137)
42050 at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:
123)
42051 at org.janusgraph.diskstorage.cql.CQLStoreManager.mutateManyUnlogged(CQLStoreManager.java:526)
42052 at org.janusgraph.diskstorage.cql.CQLStoreManager.mutateMany(CQLStoreManager.java:457)
42053 at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(Expe
ctedValueCheckingStoreManager.java:79)
42054 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:94)
42055 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91)
42056 at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
42057 at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
42058 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91)
42059 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.ja
va:133)
42060 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.mutate(CacheTransaction.java:86)
42061 at org.janusgraph.diskstorage.keycolumnvalue.cache.KCVSCache.mutateEntries(KCVSCache.java:65)
42062 at org.janusgraph.diskstorage.BackendTransaction.mutateEdges(BackendTransaction.java:200)
42063 at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:628)
42064 at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:731)
42065 at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1438)
42066 at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlu
eprintsGraph.java:297)
42067 at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:10
4)
42068 at org.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.readGraph(GraphMLReader.java:132)
42069 at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.IoStep.read(IoStep.java:132)
42070 at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.IoStep.processNextStart(IoStep.java
:110)
42071 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:14
3)
42072 at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableSte
pIterator.java:50)
42073 at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.
java:37)
42074 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)
42075 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)
42076 at org.apache.tinkerpop.gremlin.process.traversal.Traversal.iterate(Traversal.java:207)


... 
caused by: 

42095 Caused by: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.InvalidQueryException
: Batch too large
42096 at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
42097 at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
42098 at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
42099 at io.vavr.control.Try.of(Try.java:62)
42100 at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199)
42101 ... 5 more
42102 Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
42103 at com.datastax.driver.core.Responses$Error.asException(Responses.java:181)
42104 at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:215)
42105 at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
42106 at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:61)
42107 at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:10
11)
42108 at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:814)
42109 at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1262)
42110 at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1180)
42111 at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
42112 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.jav
a:377)


I'll keep the logs this tiem just in case more detail is required. 

Best, 

Carlos 


hadoopmarc@...
 

Hi Carlos,

I read the preceding discussion with Stephen Mallette, which says: From the logs, while loading this graph, Cassandra driver is almost always warning that all batches are over the limit of 5120 (which I haven't found yet where to modify ...).
A complete stacktrace would help indeed, but it strikes me that 5120 equals 20 x 256 while the cassandra cql drivers have the following defaults:

storage.cql.batch-statement-size The number of statements in each batch Integer 20
storage.cql.remote-max-requests-per-connection   The maximum number of requests per connection for remote datacenter   Integer 256


Best wishes,    Marc


cbobed <cbobed@...>
 

Hi all,

I'm trying to load a graphml export into janusgraph 0.5.3. not quite big (1.68M nodes, 8.8M edges). However, I reach a point where Tinkerpop layer tells me that the Batch is too large and it crashes (I suspect that there might be an edge with far too much information ... but finding it is difficult).

I've tried to split the graphml file into non-overlapping partitions, but janusgraph does not seem to honor the ID (I'm not actually sure whether this might be omitted at Tinkerpop or Janusgraph level), and when I reach the splitted edges part, it inserts new nodes for both source and target.

Has anyone faced this graphml loading problem before? Should I try to get a GraphSON translated version of my exported graph to run everything more smoothly?

What are the recommendations to deal with this kind of loadings? I'm trying to avoid implementing my own graph loader at this point.

Thank you very much in advance,

Best,

Carlos Bobed