Re: PageRank on Large Graph


HadoopMarc <bi...@...>
 

Hi Joe,

No, not exactly, because the TinkerPop recipe points at spark-submit as the source of most of the version conflicts. Spark-submit is just a big wrapper around the Spark launch API that sets the environment but does not do that in an application-friendly way. I would first try from the gremlin console for which the recipe was written. Doing the OLAP pagerank in a java project without spark-submit will require some effort to get the classpath right.

HTH,   Marc

Op dinsdag 26 september 2017 00:46:26 UTC+2 schreef Joseph Obernberger:

Thank you Marc.  I assume this would be java code that would be executed via spark-submit?

-Joe


On 9/25/2017 3:21 PM, HadoopMarc wrote:
Hi Joe,

Maybe a suggestion after all. I believe you ran the PageRankVertexProgram directly on the JanusGraph instance, but it should also be possible to run it on a HadoopGraph with compute(SparkGraphComputer) via JanusGraph's HBaseInputFormat. That would at least parallelize the table scan to the number of HBase regions. In my previous answer I assumed you did that!

Cheers,     Marc

Op maandag 25 september 2017 17:24:55 UTC+2 schreef Joseph Obernberger:

It reminds me of that one too!  At present, I'm locked in with HBase, so I can't make the switch to Cassandra very easily.  I did try:
result = graph.compute().program(PageRankVertexProgram.build().create()).submit().get()

It took a little over 8 hours to run, but did complete once I adjusted the hbase.client.scanner.timeout.period to something very long.  Interestingly, I had to modify that in the included jar file, not in the file in /etc/hbase/conf. 

Would really like to get this time to run way down, but not sure what other method to try.

-Joe


On 9/22/2017 1:05 PM, HadoopMarc wrote:
Hi Joe,

This question reminds me to an earlier discussion we had on the performance of OLAP traversals for janusgraph-hbase. My conclusion there that janusgraph-hbase needs a better HbaseInputFormat that delivers more partitions than one partition per HBase region. I guess Pagerank suffers from that in the same way. Do you maybe have the option to use Cassandra, which has a configurable cassandra.inpit.split.size ? I did not try this myself.

HTH,    Marc

Op vrijdag 22 september 2017 15:41:12 UTC+2 schreef Joseph Obernberger:
Hi All - I've been experimenting with SparkGraphComputer, and have it
working, but I'm having performance issues.  What is the best way to run
PageRank against a very large graph stored inside of JanusGraph?

Thank you!

-Joe

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/1bf6c7c5-84b6-483e-982c-c299fca3e8ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Virus-free. www.avg.com

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bca40d9f-6376-4dcd-b637-313bb1229d9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.