Re: HBase ScannerTimeoutException


HadoopMarc <m.c.d...@...>
 

Hi Joseph,

Sounds like OLAP is not going to help you here (you would need a gremlin database query step or a customer vertexprogram), You need a JanusGraph index on a unique property of your vertices. Then, a query g.V().has('yourprop', 'yourpropvalue').next() will return the vertex using the index, rather than doing an HBase table scan like you did before. This approach also allows you to make your code multi-threaded as long as you add vertices and edges in the right order.

The Janusgraph and Tinkerpop docs on batch loading might provide further insights:
http://docs.janusgraph.org/latest/bulk-loading.html
http://tinkerpop.apache.org/docs/current/reference/#_loading_with_bulkloadervertexprogram

The BulkLoaderVertexProgram is in effect an OLAP approach, but it assumes that you have all data organized before you start the graph loading.

Cheers,    Marc

Op vrijdag 12 mei 2017 21:34:01 UTC+2 schreef Joseph Obernberger:

Hi Marc - thank you for the reply.  I've written java code to take some data and use it to generate a graph.  After that data is put into JanusGraph, I then loop over all the nodes (in the graph) so that I can query an external database to add edges/nodes where appropriate for this particular task.  This is all in Java.

I've not used an OLAP query, but it looks like it's straight from Gremlin; so should be able to do it from Java?  Still investigating.

-Joe


On 5/11/2017 11:14 AM, HadoopMarc wrote:
Hi Joseph,

If you want to process all vertices (map operation) you need an OLAP query (currently only works for readonly tasks):
http://docs.janusgraph.org/latest/hadoop-tp3.html
http://tinkerpop.apache.org/docs/3.2.3/reference/#sparkgraphcomputer

If you want to filter the total set of vertices, you need an index on one or more properties of your vertices:
http://docs.janusgraph.org/latest/indexes.html

What do you want to accomplish apart from looping over the vertices in your graph?

HTH,   Marc

Op donderdag 11 mei 2017 16:18:50 UTC+2 schreef Joseph Obernberger:
Hi All - I'm using a loop to do a task on all vertices in fairly large
graph (million+ nodes), and the operation that I'm doing takes some
time.  I'm getting a
org.apache.hadoop.hbase.client.ScannerTimeoutException 20091230ms passed
since the last invocation, timeout is currently set to 60000.

Is there a better way to loop through all vertices besides something like:
-------------------
JanusGraphTransaction vertTrans = graph.newTransaction();
Iterator<Vertex> vertices = vertTran.vertices();

while (vertices != null && vertices.hasNext()) {
//do stuff
}
------------------

?

Thank you!

-Joe

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.