Hi Joe,
Thanks for reporting back. So, it indeed seems the same problem as for OLAP traversals: input splits of HBaseInutFormat have the size of a complete region which is a bit too much for SparkGraphComputer. I think it should be fairly easy to adapt JanusGraph->HBaseInputFormat a bit, such that the splits coming from parent HBase->TableInputFormat are split in smaller parts, let's say smaller than some configurable janusgraph.hbase.mapreduce.maxinputsplitsize=128M. All the necessary variables and methods are present in HBase->TableInputFormat. I plan to do it some time in the future, but please do not rely on it. If someone else wants to take up the work sooner, please create a ticket first so that others know.
Cheers, Marc
Op woensdag 27 september 2017 22:30:33 UTC+2 schreef Joseph Obernberger:
toggle quoted message
Show quoted text
Thank you Marc. That runs on my cluster, but takes a very long
time. If I try it on a larger graph, the YARN jobs run out of
heap. Right now I'm giving them 10G each.
On a small graph, I can run it OK, and I can run the
BulkDumperVertexProgram as well. What I can't do, when I run with
SparkGraphComputer, is look at the results.
After running:
result =
graph.compute(SparkGraphComputer).program(PageRankVertexProgram.build().create()).submit().get()
I can do a result.memory().runtime, which returns a number (in my
case 609821).
I then do:
g = result.graph().traversal(computer(SparkGraphComputer))
Unfortunately, any command with g, gives the same error - for
example:
g.V().valueMap() returns:
java.io.IOException: No input paths specified in job
Since this is a small graph, if I run it without
SparkGraphComputer, those commands on g work fine, such as:
g.V(id).valueMap('gremlin.pageRankVertexProgram.pageRank')
Trying to find any method to run PageRank on a very large graph
that is stored in JanusGraph. Thanks! Anything you would like me
to try?
-Joe
On 9/27/2017 12:04 PM, HadoopMarc
wrote:
Hi Joe,
My thoughts were more like:
graph =
GraphFactory.open('conf/hadoop-graph/read-hbase-spark-yarn.properties')
result=graph.compute(SparkGraphComputer).program(PageRankVertexProgram.build().create()).submit().get()
along the lines of "Exporting with
BulkDumperVertexProgram" in http://tinkerpop.apache.org/docs/3.2.3/reference/#sparkgraphcomputer
I am curious whether it works!
Marc
Op woensdag 27 september 2017 15:06:19 UTC+2 schreef Joseph
Obernberger:
Hi Marc - not sure I understand. I tried this:
gremlin>
g=graph.traversal()
==>graphtraversalsource[standardjanusgraph[hbase:[10.22.5.63:2181,
10.22.5.64:2181,
10.22.5.65:2181]], standard]
gremlin>
result=graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
Is that what you mean? That does not work on very large
graphs. Even on a small graph (about 9 million nodes), it
took 8 hours to complete, and uses only one machine to do
the work. I'm looking for methods to calculate values on
very large graphs. Any ideas?
Thank you!
-Joe
On 9/26/2017 3:40 PM, HadoopMarc wrote:
Hi Joe,
No, not exactly, because the TinkerPop recipe points at
spark-submit as the source of most of the version
conflicts. Spark-submit is just a big wrapper around the
Spark launch API that sets the environment but does not
do that in an application-friendly way. I would first
try from the gremlin console for which the recipe was
written. Doing the OLAP pagerank in a java project
without spark-submit will require some effort to get the
classpath right.
HTH, Marc
Op dinsdag 26 september 2017 00:46:26 UTC+2 schreef
Joseph Obernberger:
Thank you Marc. I assume this would be java code
that would be executed via spark-submit?
-Joe
On 9/25/2017 3:21 PM, HadoopMarc wrote:
Hi Joe,
Maybe a suggestion after all. I believe you ran
the PageRankVertexProgram directly on the
JanusGraph instance, but it should also be
possible to run it on a HadoopGraph with
compute(SparkGraphComputer) via JanusGraph's
HBaseInputFormat. That would at least
parallelize the table scan to the number of
HBase regions. In my previous answer I assumed
you did that!
Cheers, Marc
Op maandag 25 september 2017 17:24:55 UTC+2
schreef Joseph Obernberger:
It reminds me of that one too! At
present, I'm locked in with HBase, so I
can't make the switch to Cassandra very
easily. I did try:
result = graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
It took a little over 8 hours to run, but
did complete once I adjusted the
hbase.client.scanner.timeout.period
to something very long. Interestingly, I
had to modify that in the included jar
file, not in the file in /etc/hbase/conf.
Would really like to get this time to run
way down, but not sure what other method
to try.
-Joe
On 9/22/2017 1:05 PM, HadoopMarc wrote:
Hi Joe,
This question reminds me to an
earlier discussion we had on the
performance of OLAP traversals for
janusgraph-hbase. My conclusion there
that janusgraph-hbase needs a better
HbaseInputFormat that delivers more
partitions than one partition per HBase
region. I guess Pagerank suffers from
that in the same way. Do you maybe have
the option to use Cassandra, which has a
configurable cassandra.inpit.split.size
? I did not try this myself.
HTH, Marc
Op vrijdag 22 september 2017 15:41:12
UTC+2 schreef Joseph Obernberger:
Hi All -
I've been experimenting with
SparkGraphComputer, and have it
working, but I'm having performance
issues. What is the best way to run
PageRank against a very large graph
stored inside of JanusGraph?
Thank you!
-Joe
--
You received this message because you are
subscribed to the Google Groups
"JanusGraph users" group.
To unsubscribe from this group and stop
receiving emails from it, send an email to
janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/1bf6c7c5-84b6-483e-982c-c299fca3e8ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are
subscribed to the Google Groups "JanusGraph users"
group.
To unsubscribe from this group and stop receiving
emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bca40d9f-6376-4dcd-b637-313bb1229d9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to
the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/9fbf5f28-e86b-4158-9aec-d6924f48a266%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bd772ed8-6482-4f6f-9af5-4e36976d2bce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|