CQL for OLAP issue with Syclla as backed both Local and Yarn Mode
rakesh...@...
Hi All, I am unable to run any analytics (OLAP) on JanusGraph with Syclla as backend. I tried both in Local and Yarn mode on AWS EMR cluster
I build the distribution archive from here (from branch Issue_985_spark_via_cql) Following are the properties given in conf/hadoop-graph/read-cql.
Full stack error while running in yarn mode:
Is there anything required as classpath or required jars? also whats the problem with local mode? Do we have any alternative for this purpose (analytics on Janusgraph using spark), Currently I am running connected component using graphframes. you help is appreciated, thanks in advance :)
|
|
HadoopMarc <bi...@...>
Hi, Regarding spark-yarn, this was included in the spark-gremlin plugin for the gremlin-console distributed with TinkerPop since TinkerPop-3.3.1, but it did not make it into the spark-gremlin maven dependency, yet. Any project with JanusGraph and spark-yarn OLAP queries has to explicitly include the spark-yarn maven dependency itself. If you work in the gremlin-console of the JanusGraph distribution, you can add the spark-yarn jars manually, like in: Regarding the OLAP query output, you did not specify what line of code, either in gremlin console or in your project, should have resulted in any output. If you did not take the graph.traversal().withComputer() approach, take a look at: Cheers, Marc
|
|
rakeshsh...@...
Thanks for your response Marc, I tried the link you provided above, But still, it's showing the same error. Regarding OLAP query in local mode, I am running the following queries //properties which provided above, with spark.master=local[4] graph = GraphFactory.open('conf/hadoop-graph/read-cql.properties') g = graph.traversal().withComputer(SparkGraphComputer) g.V().limit(5) g.V().count() In both above queries, more than 500 tasks run and finished with an empty output screen If I run without SparkGraphComputer it's giving the proper output for limit or has queries
On Saturday, May 4, 2019, at 12:03:34 AM UTC+5:30, HadoopMarc wrote:
|
|
rakeshsh...@...
throws below error:
Used read-cql-properties as follow:
When I change spark.master=local[4] it starts some execution and print empty output after all tasks finishes,
On Thursday, May 2, 2019, at 11:18:18 PM UTC+5:30, rak...@... wrote:
|
|
Nitin Poddar <hitk.ni...@...>
Hi Rakesh, were you able to resolve this issue? I am getting the exact same error message and there isn't much help available online around this. Could you please help me here. Thanks, Nitin
On Thursday, May 2, 2019 at 1:48:18 PM UTC-4, Rakesh Sharma wrote:
|
|
Evgeniy Ignatiev <yevgeniy...@...>
Hello Nitin, Looks like your installation lacks required Spark jars -
https://groups.google.com/d/msg/gremlin-users/LYv-cvZ66hU/TJUTvLzCAAAJ
- you have to provide full installation
Best regards, On 5/27/2020 6:02 PM, Nitin Poddar
wrote:
-- Best regards, Evgeniy Ignatiev.
|
|
rakesh...@...
Hi Nitin, Yeah I was able to resolve issue mentioned above, please follow the below steps Steps, create bash file eg jg.sh
Was unable to finish the job as we have more than billions of linkages in the graph and after that didn't get time to start OLAP again, let me know if you get any progress and best practices on Tuning for large scale graph for OLAP queries
On Wednesday, May 27, 2020 at 8:59:47 PM UTC+5:30, Evgeniy Ignatiev wrote:
|
|
Nitin Poddar <hitk.ni...@...>
Thank you Evgenii, I followed the post and have been trying to resolve the issue for almost a week now. I might been getting some version conflicts as well. Will try again.
On Wednesday, May 27, 2020 at 11:29:47 AM UTC-4, Evgeniy Ignatiev wrote:
|
|
Nitin Poddar <hitk.ni...@...>
Hi Rakesh, Thank you for your reply. I will definitely share the performance tuning best practices and learning as I find more. However, can you please share the properties file (read-cql- Thanks Nitin
On Wednesday, May 27, 2020 at 12:45:07 PM UTC-4, Rakesh Sharma wrote:
|
|
rakesh...@...
Sure please find the read-cql-dynalloc.properties Can change param as per your requirement.
On Wednesday, May 27, 2020 at 10:28:10 PM UTC+5:30, Nitin Poddar wrote:
|
|
Nitin Poddar <hitk.ni...@...>
Thanks Rakesh. I will try the with changes from your properties file and let you know if I face any issues. I have been struggling with this for over a week now. :) Just curios, I see that you did not use Elasticsearch for indexing, any reasons why? It can significantly improve your OLAP performance. Thanks, Nitin
On Wednesday, May 27, 2020 at 1:15:16 PM UTC-4, Rakesh Sharma wrote:
|
|
Evgeniy Ignatiev <yevgeniy...@...>
How does ES affects OLAP performance? Correct me, if I am wrong,
but unless it is explicitly used in Spark custom code, JanusGraph
integration will not leverage it, and it is definitely not being
contacted when loading graph data in-memory for Spark
VertexProgram execution. Best regards, On 27.05.2020 21:55, Nitin Poddar
wrote:
|
|