Date   

Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits

Roy Yu <7604...@...>
 

you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. 
---------------------
Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory  to 40G, it failed either. I felt desperate and come here to ask for help

On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc

Thanks for your immediate response.
I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is  my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format.
This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.

On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,

There seem to be three things bothering you here:
  1. you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
  2. you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
  3. the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region.
A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.

Best wishes,

Marc

Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. 

 graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3

Region info:
hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67     134    /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G  5.1 G  /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$

Anybody who can help me?


Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits

Roy Yu <7604...@...>
 

Hi Marc

Thanks for your immediate response.
I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is  my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format.
This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.

On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,

There seem to be three things bothering you here:
  1. you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
  2. you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
  3. the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region.
A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.

Best wishes,

Marc

Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. 

 graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3

Region info:
hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67     134    /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G  5.1 G  /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$

Anybody who can help me?


ERROR: Could not commit transaction due to exception during persistence

Gaurav Sehgal <gaurav.s...@...>
 

```
Caused by: java.lang.IllegalArgumentException: Multiple entries with same key: Completeness_metric.Status=org.janusgraph.diskstorage.indexing.StandardKeyInformation@6bf65470 and Completeness_metric.Status=org.janusgraph.diskstorage.indexing.StandardKeyInformation@75c435b8
at com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:215)
at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:209)
at com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:147)
at com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:110)
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:393)
at org.janusgraph.graphdb.database.IndexSerializer$IndexInfoRetriever$1.get(IndexSerializer.java:165)
at org.janusgraph.diskstorage.indexing.IndexTransaction.getIndexMutation(IndexTransaction.java:82)
at org.janusgraph.diskstorage.indexing.IndexTransaction.delete(IndexTransaction.java:75)
at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:649)
at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:731)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1425)
```

Hello Everyone! We are getting this error consistently on our production instance. At first, we thought it was something related to concurrent transactions but looks like it is not as we are unable to reproduce this. Now we are not really sure what could be the other possible situations where this error could arise. Really appreciate someone's help here.

Janusgraph: v0.5.1
ES: v7.6.2
Cassandra: v3.11.0

Thanks


Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits

HadoopMarc <bi...@...>
 

Hi Roy,

There seem to be three things bothering you here:
  1. you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
  2. you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
  3. the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region.
A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.

Best wishes,

Marc

Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:

Error message:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. 

 graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3

Region info:
hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67     134    /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G  5.1 G  /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$

Anybody who can help me?


Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits

Roy Yu <7604...@...>
 

Error message:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. 

 graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3

Region info:
hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67     134    /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G  5.1 G  /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0      0      /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$

Anybody who can help me?


Re: Running OLAP on HBase with SparkGraphComputer fails on shuffle/Pregel message pass

Roy Yu <7604...@...>
 

I have the same promblem, have you ever solved it?


On Wednesday, May 30, 2018 at 5:30:14 PM UTC+8 yevg...@... wrote:
Hello.

Recently we faced an issue with running PageRank on HBase: for comparison purposes we loaded our graph from Cassandra to the HBase deployment of the same size and unlike on Cassandra - all attempts to run page rank on that graph fail with initial cause pointing to the SparkExecutor line 165 in spark-gremlin:

viewOutgoingRDD.flatMapToPair(messageFunction).reduceByKey(graphRDD.partitioner().get(), reducerFunction) :

It always happens with the message in logs that container requested more memory that allowed by its configuration, like:

Reason: Container killed by YARN for exceeding memory limits. 43.0 GB of 42 GB physical memory used.

According to logs error consistently seems to occur on the first message pass phase of vertex program - right after the initial iteration.

Here is one of configurations we tried to run OLAP on HBase, with the same Spark related properties we use to perform queries on Cassandra:

gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

storage
.backend=hbase
storage
.hbase.snapshot-name=jsnapshot

#
# Hadoop Graph Configuration
#
gremlin
.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseSnapshotInputFormat
gremlin
.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output

#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr
.ioformat.conf.storage.backend=hbase
janusgraphmr
.ioformat.conf.storage.hbase.table=janusgraph

#
# Spark Configuration
#
spark
.master=yarn
spark
.deploy.mode=cluster
spark
.executor.memory=12g
spark
.driver.memory=2g
spark
.executor.cores=4
spark
.executor.instances=12
spark
.serializer=org.apache.spark.serializer.KryoSerializer

We tried to increase memory per executor as much as we can and tweaking gremlin.spark.graphStorageLevel without any success.

Did anybody experience similar issues with running on SparkGraphComputer with HBaseInputFormat/HBaseSnapshotInputFormat or probably on other backends?

Best regards,
Evgeniy Ignatiev.



Re: Not sure if vertex centric index is being used

chrism <cmil...@...>
 

I have the same question. Anyone able to offer some light on this issue.
There is (as usually) certain way of constructing Gremlin Query to get local index used,
but there is not way to know it is used or not, making such effort rather a guess and trying to rely on speed improvements.


On Wednesday, October 10, 2018 at 8:53:36 PM UTC+11 m...@... wrote:
It does say  isFitted=true in the pfofile output but it doesn't mention the relation Index name like it does in case of composite indexes.
Is it that for vertex centric indexes index name is not displayed in profile output? Is this a bug?

gremlin> g.V().has('project', 'projectId', 138).outE('hasHouseUnit').has('hasHouseUnitBlock', 'C').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(project), projectI...                     1           1           0.641    37.98
   
\_condition=(~label = project AND projectId = 138)
   
\_isFitted=false
   
\_query=multiKSQ[1]@2147483647
   
\_index=byProjectIdComposite
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0.021
  optimization                                                                                
0.212
JanusGraphVertexStep([hasHouseUnitBlock.eq(C)])                      679         679           1.048    62.02
   
\_condition=(hasHouseUnitBlock = C AND type[hasHouseUnit])
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@85318457
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0.074
                                           
>TOTAL                     -           -           1.690        -




Re: Janusgraph Hadoop Spark standalone cluster - Janusgraph job always creates constant number 513 of Spark tasks

HadoopMarc <bi...@...>
 

Hi Varun,

Not a solution, but someone in the thread below explained the 257 magic number for OLAP on a Cassandra cluster:
https://groups.google.com/g/janusgraph-users/c/IdrRyIefihY

Marc


Op vrijdag 4 december 2020 om 20:48:28 UTC+1 schreef Varun Ganesh:

Hi,

I am facing this same issue. I am using SparkGraphComputer to read from Janusgraph backed by cassandra. `g.V().count()` takes about 3 minutes to load just two rows that I have in the graph.

I see that about 257 tasks are created. In my case, I am seeing parallelism in the spark cluster that I am using but each task seems to take about ~5 seconds on average and there is no obvious reason why.

(I can attach a page from the Spark UI and also the properties file I am using, but I am unable to find the option to)

Would appreciate any input on solving this. Thank you!

Varun

On Friday, December 6, 2019 at 5:04:55 AM UTC-5 s...@... wrote:
Hi Dimitar,

I'm experiencing the same problem of having some seemingly uncontrollable static number of Spark task - did you ever figure out how to fix this?

Thanks,
Sture


On Friday, October 18, 2019 at 4:19:19 PM UTC+2, dim...@... wrote:
Hello,

I have setup Janusgraph 0.4.0 with Hadoop 2.9.0 and Spark 2.4.4 in a K8s cluster.
I connect to Janusgraph from gremlin console and execute: 
gremlin> og
==>graphtraversalsource[hadoopgraph[cassandra3inputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
==>1889

It takes 25min to do the count! The same time took when there were no vertices - e.g. -> 0.  Spark job shows that there were 513 tasks run! Number of task is always constant 513 no matter of the number of vertices.
I have set "spark.sql.shuffle.partitions=4" at spark job's environment, but again the number of Spark tasks was 513! My assumption is that Janusgraph somehow specifies this number of tasks when submits the job to Spark.
The questions are:
- Why Janusgraph job submitted to Spark is always palatalized to 513 tasks? 
- How to manage the number of tasks which are created for a Janusgrap job? 
- How to minimize the execution time of OLAP query for this small graph (OLTP query takes less than a second to execute)?

Thanks,
Dimitar


Re: Janusgraph Hadoop Spark standalone cluster - Janusgraph job always creates constant number 513 of Spark tasks

Varun Ganesh <operatio...@...>
 

Hi,

I am facing this same issue. I am using SparkGraphComputer to read from Janusgraph backed by cassandra. `g.V().count()` takes about 3 minutes to load just two rows that I have in the graph.

I see that about 257 tasks are created. In my case, I am seeing parallelism in the spark cluster that I am using but each task seems to take about ~5 seconds on average and there is no obvious reason why.

(I can attach a page from the Spark UI and also the properties file I am using, but I am unable to find the option to)

Would appreciate any input on solving this. Thank you!

Varun

On Friday, December 6, 2019 at 5:04:55 AM UTC-5 s...@... wrote:
Hi Dimitar,

I'm experiencing the same problem of having some seemingly uncontrollable static number of Spark task - did you ever figure out how to fix this?

Thanks,
Sture


On Friday, October 18, 2019 at 4:19:19 PM UTC+2, dim...@... wrote:
Hello,

I have setup Janusgraph 0.4.0 with Hadoop 2.9.0 and Spark 2.4.4 in a K8s cluster.
I connect to Janusgraph from gremlin console and execute: 
gremlin> og
==>graphtraversalsource[hadoopgraph[cassandra3inputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
==>1889

It takes 25min to do the count! The same time took when there were no vertices - e.g. -> 0.  Spark job shows that there were 513 tasks run! Number of task is always constant 513 no matter of the number of vertices.
I have set "spark.sql.shuffle.partitions=4" at spark job's environment, but again the number of Spark tasks was 513! My assumption is that Janusgraph somehow specifies this number of tasks when submits the job to Spark.
The questions are:
- Why Janusgraph job submitted to Spark is always palatalized to 513 tasks? 
- How to manage the number of tasks which are created for a Janusgrap job? 
- How to minimize the execution time of OLAP query for this small graph (OLTP query takes less than a second to execute)?

Thanks,
Dimitar


Re: Configuring Transaction Log feature

Sandeep Mishra <sandy...@...>
 

pawan,
can you check for following in your logs Loaded unidentified ReadMarker start time...
seems your readmarker is starting from 1970. so it tries to read changes since then

Regards,
Sandeep

On Saturday, November 28, 2020 at 8:48:18 PM UTC+8 shr...@... wrote:
one correction to last post in below line.

    JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("TestLog").start();



On Saturday, 28 November 2020 at 18:16:09 UTC+5:30 Pawan Shriwas wrote:
Hi Sandeep,

Please see below java code and properties information which I am trying in local with Cassandra cql as backend.  This code is not giving me the change log as event which I can get via gremlin console with same script and properties. Please let me know if anything needs to be modify here with code or properties.

<!-- Java Code -->
package com.example.graph;

import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphTransaction;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.log.ChangeProcessor;
import org.janusgraph.core.log.ChangeState;
import org.janusgraph.core.log.LogProcessorFramework;
import org.janusgraph.core.log.TransactionId;

public class TestLog {
public static void listenLogsEvent(){
JanusGraph graph = JanusGraphFactory.open("/home/ist/Downloads/IM/jgraphdb_local.properties");
LogProcessorFramework logProcessor = JanusGraphFactory.openTransactionLog(graph);

logProcessor.addLogProcessor("TestLog").
    setProcessorIdentifier("TestLogCounter").
    setStartTimeNow().
    addProcessor(new ChangeProcessor(){
        @Override
        public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
        System.out.println("tx--"+tx.toString());
        System.out.println("txId--"+txId.toString());
        System.out.println("changeState--"+changeState.toString());
       }
    }).
    build();
for(int i=0;i<=10;i++) {
        System.out.println("going to add ="+i);
    JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("PawanTestLog").start();
    JanusGraphVertex a = tx.addVertex("TimeL");
    a.property("type", "HOLD");
    a.property("serialNo", "XS31B4");
    tx.commit();
        System.out.println("Vertex committed ="+a.toString());
}
}
public static void main(String[] args) {
System.out.println("starting main");
listenLogsEvent();
}
}

<!----- graph properties------->
gremlin.graph=org.janusgraph.core.JanusGraphFactory
graph.name=TestGraph
storage.backend = cql
storage.hostname = localhost
storage.cql.keyspace=janusgraphcql
query.fast-property = true
storage.lock.wait-time=10000
storage.batch-loading=true

Thanks in advance.

Thanks,
Pawan


On Saturday, 28 November 2020 at 16:19:20 UTC+5:30 sa...@... wrote:
Pawan,
Can you elaborate more on the program where your are trying to embed the script in?
Regards,
Sandeep

On Sat, 28 Nov 2020, 13:48 Pawan Shriwas, <shr...@...> wrote:
Hey Jason,

Same thing happen with my as well where above script work well in gremlin console  but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.  


On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend.

The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution.

import java.util.concurrent.atomic.*;
import org.janusgraph.core.log.*;
import java.util.concurrent.*;

graph
= JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');

totalHumansAdded
= new AtomicInteger(0);
totalGodsAdded
= new AtomicInteger(0);
logProcessor
= JanusGraphFactory.openTransactionLog(graph);
logProcessor
.addLogProcessor("addedPerson").
        setProcessorIdentifier
("addedPersonCounter").
        setStartTime
(Instant.now()).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("human")) totalHumansAdded.incrementAndGet();
                   
System.out.println("total humans = " + totalHumansAdded);
               
}
           
}
       
}).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("god")) totalGodsAdded.incrementAndGet();
                   
System.out.println("total gods = " + totalGodsAdded);
               
}
           
}
       
}).
        build
()

tx
= graph.buildTransaction().logIdentifier("addedPerson").start();
u
= tx.addVertex(T.label, "human");
u
.property("name", "proteros");
u
.property("age", 36);
tx
.commit();

If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson".

Did you have some example code of what you are attempting?



On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote:
Hi Guys,

We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at
public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {

Janusgraph documentation says value for log.[X].backend is 'default'.
Not sure what exactly it means. does it mean HBase which is being used as backend for data.

Please let  me know, if anyone has configured it.

Thanks and Regards,
Sandeep Mishra

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/JN4ZsB9_DMM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgr...@....


Re: Use index for sorting

toom <to...@...>
 

The problem of using custom value for null is that we need to choose a value for each data type, and hope that nobody will try to use this particular value. I suppose it is feasible for data type like string, date or double but not for boolean.

Toom.

On Friday, December 4, 2020 at 2:22:02 AM UTC+1 li...@... wrote:
No, null support is an optional feature for graph providers. JanusGraph does not allow null value and I don’t think it will be supported (in near future).

Apart from the solution suggested by Marc, is it possible for you to come up with some custom value to represent null?

Best regards,
Boxuan

「toom <t...@...>」在 2020年12月4日 週五,上午3:17 寫道:

Hi Marc,

Thank you for your response.
If I understand correctly, with TinkerPop 3.5 I will be able to sort on property with missing values. It is a good news.
Do you know it JanusGraph 0.6.0 will be release with that version ?

Regarding the impact of the step order on index use, I wrote a strategy [1] that put HasStep and OrderStep before FilterStep if they follow a GraphStep.

Best regards,

Toom.

On Thursday, December 3, 2020 at 8:17:27 AM UTC+1 HadoopMarc wrote:
Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:
Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/c6bcb878-bfc2-4d8f-a3ee-38cc214382bcn%40googlegroups.com.


Re: Use index for sorting

"Li, Boxuan" <libo...@...>
 

No, null support is an optional feature for graph providers. JanusGraph does not allow null value and I don’t think it will be supported (in near future).

Apart from the solution suggested by Marc, is it possible for you to come up with some custom value to represent null?

Best regards,
Boxuan

「toom <to...@...>」在 2020年12月4日 週五,上午3:17 寫道:


Hi Marc,

Thank you for your response.
If I understand correctly, with TinkerPop 3.5 I will be able to sort on property with missing values. It is a good news.
Do you know it JanusGraph 0.6.0 will be release with that version ?

Regarding the impact of the step order on index use, I wrote a strategy [1] that put HasStep and OrderStep before FilterStep if they follow a GraphStep.

Best regards,

Toom.

On Thursday, December 3, 2020 at 8:17:27 AM UTC+1 HadoopMarc wrote:
Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:
Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/c6bcb878-bfc2-4d8f-a3ee-38cc214382bcn%40googlegroups.com.


Re: Use index for sorting

toom <to...@...>
 


Hi Marc,

Thank you for your response.
If I understand correctly, with TinkerPop 3.5 I will be able to sort on property with missing values. It is a good news.
Do you know it JanusGraph 0.6.0 will be release with that version ?

Regarding the impact of the step order on index use, I wrote a strategy [1] that put HasStep and OrderStep before FilterStep if they follow a GraphStep.

Best regards,

Toom.

[1] https://gist.github.com/To-om/e7e406d85dcdd65bc2a533960b50a1d2

On Thursday, December 3, 2020 at 8:17:27 AM UTC+1 HadoopMarc wrote:
Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:
Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.


Re: Use index for sorting

HadoopMarc <bi...@...>
 

Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:

Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.


Re: Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM

HadoopMarc <bi...@...>
 

Hi, it does not seem impossible but you should really test it and ask yourself the questions:
  • do your really need gremlin to query your data (instead of SQL)?
  • do you really need Elasticsearch for non-equality matches (read about CompositeIndex and MixedIndex)?
  • for incidental gremlin queries isn't it enough to load the SQL database into a Tinkergraph when needed?
  • what does this Marc really know?
Best wishes,    Marc

Op donderdag 3 december 2020 om 00:58:03 UTC+1 schreef p...@...:

Hi, is possible to Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM?

The system has more reads than writes, we create 10 vertex per hour and the system has a average of 20 concurrents users reading the database. Just an Category/User/Ads manager with only 300k or a little more vertex. And basic traversals.

Can i have some problems or its possible?

How can i determine if its possible? Thank you.


Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM

"p...@pwill.com.br" <pw...@...>
 

Hi, is possible to Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM?

The system has more reads than writes, we create 10 vertex per hour and the system has a average of 20 concurrents users reading the database. Just an Category/User/Ads manager with only 300k or a little more vertex. And basic traversals.

Can i have some problems or its possible?

How can i determine if its possible? Thank you.


Use index for sorting

toom <to...@...>
 

Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.


Re: throw NullPointerException when query with hasLabel() script

阳生丙 <ouyang....@...>
 

it is reproducible in my environment. and this problem is not appeared when janusgraph server started. 

it is always there after i running into it for the first time, continuing to now

i have another server running, but there is no this probelm.

what informations do you want me to provide? the only log messages while i submit above gremlin script are posted above.


在2020年11月30日星期一 UTC+8 下午6:52:31<HadoopMarc> 写道:

Hi,

Can you please send something going wrong that is reproducible? Below you find an example on the latest janusgraph-full-0.5.2 where hasLabel() works fine.

marc:/tera/lib/janusgraph-full-0.5.2$ bin/janusgraph.sh start
Forking Cassandra...
Running `nodetool statusthrift`..... OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9200)........ OK (connected to 127.0.0.1:9200).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)...... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.

marc:/tera/lib/janusgraph-full-0.5.2$ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
gremlin> g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))
==>graphtraversalsource[emptygraph[empty], standard]
gremlin> g.V().elementMap()
==>[id:4320,label:monster,name:cerberus]
==>[id:8320,label:demigod,name:hercules,age:30]
==>[id:16632,label:god,name:pluto,age:4000]
==>[id:4272,label:titan,name:saturn,age:10000]
==>[id:8368,label:location,name:sea]
==>[id:4200,label:monster,name:nemean]
==>[id:20728,label:monster,name:hydra]
==>[id:24824,label:location,name:tartarus]
==>[id:8440,label:god,name:neptune,age:4500]
==>[id:4224,label:god,name:jupiter,age:5000]
==>[id:12536,label:human,name:alcmene,age:45]
==>[id:4344,label:location,name:sky]
gremlin> g.V().hasLabel('monster').elementMap()
==>[id:4320,label:monster,name:cerberus]
==>[id:4200,label:monster,name:nemean]
==>[id:20728,label:monster,name:hydra]
gremlin> g.V().hasLabel('monster').has('name', 'cerberus').elementMap()
==>[id:4320,label:monster,name:cerberus]
gremlin>


Best wishes,    Marc

Op maandag 30 november 2020 om 08:52:34 UTC+1 schreef ouy...@...:
to HadoopMarc:

it is not about double quote. the problem is: when gremlin script contains hasLabel() clause, janus server will throws NullPointerException.

i use console and gremlin driver to fire these query to remote janusgraph server.



在2020年11月28日星期六 UTC+8 上午2:48:23<HadoopMarc> 写道:
Hi,

Do you mean that the only difference  between the first and the second script is the presence of the closing double quote? I see the stacktrace is from Gremlin Server: did you use Gremlin Console to fire the query? What happens if you make a GLV/bytecode connection instead of a connection for groovy scripts, that is using:

g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))

Best wishes,    Marc

Op vrijdag 27 november 2020 om 19:05:24 UTC+1 schreef ouy...@...:
query with gremlin script:  "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')"
the returned result is expected and correct.

but query with gremlin script: "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')

throws NullPointerException.

the exception stack info is as follow:

2020-11-27 12:07:35,227 WARN  [gremlin-server-exec-10] AbstractEvalOpProcessor.java:311 - Exception processing a script on request [RequestMessage{, requestId=cb1f0ab6-cee6-47c4-bbc1-196f27b1b14c, op='eval', processor='', args={gremlin=g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value'), batchSize=64}}].
java.lang.NullPointerException
        at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:41)
        at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:68)
        at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
        at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:153)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:652)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at com.google.common.collect.Iterators$5.hasNext(Iterators.java:547)
        at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
        at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
        at org.janusgraph.graphdb.util.MultiDistinctOrderedIterator.hasNext(MultiDistinctOrderedIterator.java:75)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:146)
        at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$5(AbstractEvalOpProcessor.java:264)
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:278)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


and the problem is reproducible,  is it a bug?

enviroment:
janusgraph version: 0.5.2
backend cassandra version: 3.11



Re: OLAP, Hadoop, Spark and Cassandra

HadoopMarc <bi...@...>
 

Hi Mladen,

Interesting read! Spark is not very sensitive to the number of tasks. I believe that for OLAP on HadoopGraph the optimum is for partitions of 256 Mb or so. Larger is difficult to hold in memory for reasonably sized executors. Smaller gives too much overhead. OLAP with janusgraph-hbase is much harder, because the partition size is determined by the HBase regions which need to be large (10GB). Also note that the entire graph needs to fit into the total memory of all executors  because graph traversing is shuffle-heavy and spilling to disk will take endlessly.

Best wishes,    Marc

Op maandag 30 november 2020 om 19:09:15 UTC+1 schreef Mladen Marović:

I know I'm quite late to the party, but for future reference - the number of input partitions in Spark depends on the partitioning of the source. In case of cassandra, partitioning is determined by the number of tokens each node gets (as configured by `num_tokens` in `cassandra.yaml`), which is set to 256 by default. So, if you have a 3-node cassandra cluster, by default each node should get 256 tokens, which would result in 3*256 = 768 tokens total. Since Spark reads directly from cassandra (if you're using `org.janusgraph.hadoop.formats.cql.CqlInputFormat`), that translates to 768 partitions in the input Spark RDD, or 768 tasks during processing. Add to that 1 task that collects results, or something similar, and you end up at 769. At least that was my experience.

The default value of 256 for `num_tokens` made sense in older versions, but in cassandra 3.x a new token allocation algorithm was implemented to improve performance for operations requiring token-range scans, which is precisely what Spark does. I experimented a bit with smaller values (e.g. 16) and managed to drastically reduce the number of tasks when scanning the entire graph. For further, reading, I recommend this article.



On Thursday, December 5, 2019 at 9:28:26 AM UTC+1 s...@... wrote:
Answering my own question - turned out I had had a mixup of keyspaces used between the two instances

Default the conf/hadoop-graph/read-cql.properties reads

janusgraphmr.ioformat.conf.storage.cassandra.keyspace

While for CQL it should read

janusgraphmr.ioformat.conf.storage.cql.keyspace

Also - as I made a 'named' (ve_graph) graph I had to point to that one rather than the janusgraph keyspace.

Problem 1 solved. Now to the next - how can I lower the number of 'partitions' Spark is using (here 796  '... on localhost (executor driver) (769/769)')?  

On Wednesday, December 4, 2019 at 11:46:42 PM UTC+1, Sture Lygren wrote:
Hi,

I'm trying to get JanusGraph 0.4.0 with a Cassandra (CQL) backend setup and running as OLAP while still keeping OLTP active in order to do graph updates. I've been searching high and low for some guidance, but so far without any luck. Hopefully someone here could tune in and help?

Here's where I'm at currently

  • local Hadoop running according to https://old-docs.janusgraph.org/0.4.0/hadoop-tp3.html
  • gremlin server started as /bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
  • gremlin-server-configuration.yaml points to init.groovy script doing the traversal mappings for OLTP and OLAP
def globals = [:]
ve = ConfiguredGraphFactory.open("ve_graph")
OLAPGraph = GraphFactory.open('conf/hadoop-graph/read-cql.properties')
globals << [g : ve.traversal(), sg: OLAPGraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)]
  • conf/hadoop-graph/read-cql.properties reads
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
  • Running the gremlin shell I have
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[655848fc-b46e-40be-8174-f0dc42cdabd4]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[655848fc-b46e-40be-8174-f0dc42cdabd4] - type ':remote console' to return to local mode
gremlin> g
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin>
gremlin> sg
==>graphtraversalsource[hadoopgraph[cqlinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().has('lbl','System').count()
==>68
gremlin> sg.V().has('lbl','System').count()
  • The job is running for some time and while finishing the gremlin-server.log reads
253856 [Executor task launch worker for task 768] INFO  org.apache.spark.executor.Executor  - Finished task 768.0 in stage 0.0 (TID 768). 2388 bytes result sent to driver
253858 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSetManager  - Finished task 768.0 in stage 0.0 (TID 768) in 6809 ms on localhost (executor driver) (769/769)
253861 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - ResultStage 0 (fold at SparkStarBarrierInterceptor.java:101) finished in 161.427 s
253861 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSchedulerImpl  - Removed TaskSet 0.0, whose tasks have all completed, from pool
253876 [SparkGraphComputer-boss] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 0 finished: fold at SparkStarBarrierInterceptor.java:101, took 161.598267 s
253888 [SparkGraphComputer-boss] INFO  org.apache.spark.rdd.MapPartitionsRDD  - Removing RDD 1 from persistence list
253901 [block-manager-slave-async-thread-pool-0] INFO  org.apache.spark.storage.BlockManager  - Removing RDD 1
  • However - the count (==> ) reads 0 for the sg traversal
I've most likely missed some crucial point here, but I'm not able to spot it. Please help.



Re: SimplePath query is slower in 6 node vs 3 node Cassandra cluster

Varun Ganesh <operatio...@...>
 

Hi Boxuan,

Thank you for getting back to me. Please find my responses below:

> Did you check the hardware differences? 
Yes I can confirm that the two clusters are identical except for the number of nodes.

> the data involved in your query is probably distributed across nodes
This was our initial guess as well. However, if that was the case, we should technically observe this slowness for all the queries that we try. But it is only observed for "path" queries.

For instance, here's an example of another traversal query where we observe the SAME latency across the 3 and 6 node clusters:
g.V().hasLabel('label_B').has('some_id', 123).has('data.name', 1234567).both('sample_edge').valueMap('data.field1', 'data.field2').next(10)

> Then there would be fewer round-trips happening within the 3-node cluster
I also want to point out that we are not running the Janusgraph in embedded mode (where it is colocated with Cassandra), instead it is running separately on its own server nodes

> Of course with larger cluster you can achieve higher throughput
Interestingly we are not observing any difference in the throughput (i.e. the maximum queries per second that can be handled without seeing timeouts) between the two clusters

Would appreciate any input on where/how we could possibly investigate further.

Thank you!
Varun

On Thursday, November 26, 2020 at 11:19:32 AM UTC-5 li...@... wrote:
Hi,

> why the query is 3x slower on 6 nodes

Did you check the hardware differences? Probably the 6-node cluster has slower network, less memory, slower disk, etc.
Another possibility that I can think of is, the data involved in your query is probably distributed across nodes. Since your 3-node cassandra cluster has 3x replication factor, I would presume all data you have is available on every node. Then there would be fewer round-trips happening within the 3-node cluster.
Generally it makes sense to me that the latency of a small cluster is shorter than that of a large cluster, as long as both clusters are not fully loaded. Of course with larger cluster you can achieve higher throughput.

> the profile step above shows a time taken of >1000ms

This can be a bug in profiling. If you can provide a minimal example to reproduce, that would be very helpful.

Best regards,
Boxuan


On Nov 25, 2020, at 6:04 AM, Varun Ganesh <oper...@...> wrote:

Just an additional note,  you may have noticed that the profile step above shows a time taken of >1000ms. I do not know why this is the case.

When run on the console without profile, it reflects the true time taken:
 gremlin> clockWithResult(10) { graph.tx().rollback(); g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).next() }
 ==>130.9545608

Thanks!

On Tuesday, November 24, 2020 at 4:35:22 PM UTC-5 Varun Ganesh wrote:
Hello,

I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.

I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)

I am running the below query on both of them:

g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()

The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.

I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.

g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(label_A), o...                     1           1           4.582     0.39
    \_condition=(~label = label_A AND some_id = 123 AND data.name = value1)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]@8000
    \_index=someVertexByNameComposite
  optimization                                                                                 0.028
  optimization                                                                                 0.907
  backend-query                                                        1                       3.012
    \_query=someVertexByNameComposite:multiKSQ[1]@8000
    \_limit=8000
RepeatStep([JanusGraphVertexStep(BOTH,[...                     2           2        1167.493    99.45
  HasStep([data.name.eq(...                                                          803.247
  JanusGraphVertexStep(BOTH,[...                           12934       12934         334.095
    \_condition=type[sample_edge]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
    \_multi=true
    \_vertices=264
    optimization                                                                               0.073
    backend-query                                                    266                       5.640
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
    optimization                                                                               0.028
    backend-query                                                  12689                     312.544
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
  PathFilterStep(simple)                                           12441       12441          10.980
  JanusGraphMultiQueryStep(RepeatEndStep)                           1187        1187          11.825
  RepeatEndStep                                                        2           2         810.468
RangeGlobalStep(0,1)                                                   1           1           0.419     0.04
PathStep([value(data.name)])                                 1           1           1.474     0.13
                                            >TOTAL                     -           -        1173.969        -

I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.

I realise that you may require more context. Happy to provide more information as required!

 Thank you!
 

-- 
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6d2483f7-062a-4a95-98b2-6b4aafa87cd3n%40googlegroups.com.

1281 - 1300 of 6651