JanusGraph 0.5.3 SparkGraph Computer with YARN Error - java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetNewApplicationRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google.protobuf.Message
Hello!
I am currently trying to set up SparkGraphComputer using JanusGraph with a CQL storage and ElasticSearch Index backend, and am receiving an error when trying to complete a simple vertex count traversal in the gremlin console: gremlin> hadoop_graph = GraphFactory.open('conf/hadoop-graph/olap/olap-cassandra-HadoopGraph-YARN.properties')Relevant cluster details
Properties File ###########
# Gremlin #
###########
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
###################
# Index - Elastic #
###################
janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
#Elastic Basic Auth
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.basic.password=[password]
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.basic.username=[username]
janusgraphmr.ioformat.conf.index.search.elasticsearch.http.auth.type=[authtype]
#Hosts
janusgraphmr.ioformat.conf.index.search.hostname=[hosts]
janusgraphmr.ioformat.conf.index.search.index-name=[myindexname]
metrics.console.interval=60000
metrics.enabled= false
#################
# Storage - CQL #
#################
schema.default=none
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.batch-loading=true
janusgraphmr.ioformat.conf.storage.buffer-size=10000
janusgraphmr.ioformat.conf.storage.cql.keyspace=[keyspace]
#HOSTS and PORTS
janusgraphmr.ioformat.conf.storage.hostname=[hosts]
janusgraphmr.ioformat.conf.storage.password=[password]
janusgraphmr.ioformat.conf.storage.username=[username]
cassandra.output.native.port=9042
#InputFormat configuration
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true
####################################
# SparkGraphComputer Configuration #
####################################
spark.master=yarn
spark.submit.deployMode=client
#Spark Job Configurations
spark.sql.shuffle.partitions=1000
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
spark.driver.maxResultSize=2G
# Gremlin and Serializer Configuration
gremlin.spark.persistContext=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
spark.executor.extraClassPath=/home/hadoop/janusgraph-full-0.5.3/*
# Special Yarn Configuration (WIP)
spark.yarn.archives=/home/hadoop/janusgraph-full-0.5.3/lib.zip
spark.yarn.jars=/home/hadoop/janusgraph-full-0.5.3/lib/*
spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.dist.archives=/home/hadoop/janusgraph-full-0.5.3/lib.zip
spark.yarn.dist.files=/home/hadoop/janusgraph-full-0.5.3/lib/janusgraph-cql-0.5.3.jar
spark.yarn.shuffle.stopOnFailure=true
#Spark Driver and Executors
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraClassPath=janusgraph-cql-0.5.3.jar:./lib.zip/*:./lib/keys/*
In addition to this - I've also tried the same configuration file and implementation listed steps above on an older AWS EMR release, which was running Hadoop 2.7:
And received a similar error in the gremlin console:
Does anyone have any tips on resolving these issues, or has any experience with implementing JanusGraph 0.5.3 on AWS EMR? Thanks!
|
|
ScriptExecutor Deprecated but Used in gremlin.bat
fredrick.eisele@...
I see use of the deprecated http://tinkerpop.apache.org/javadocs/3.2.10/full/org/apache/tinkerpop/gremlin/groovy/jsr223/ScriptExecutor.html It seems the fix is to replace Should an issue be opened to fix this?
|
|
Re: JMX authentication for cassandra
hadoopmarc@...
Hi Vinayak,
This question is probably better addressed to: https://cassandra.apache.org/community/ as I cannot remember having seen this discussed in the JanusGraph community. Best wishes, Marc
|
|
Re: Threads are unresponsive for some time after a particular amount of data transfer(119MB)
hadoopmarc@...
Hi Vinayak,
For embedded use of janusgraph, see: https://docs.janusgraph.org/getting-started/basic-usage/#loading-with-an-index-backend and replace the properties file with the one currently used by gremlin server. With embedded use, you can simply do (if your graph is not too large): vertices = g.V().toList() edges = g.E().toList() subGraph = g.E().subgraph('sub').cap('sub').next() Best wishes, Marc
|
|
Re: Threads are unresponsive for some time after a particular amount of data transfer(119MB)
Vinayak Bali
Hi Marc, I went through some blogs but didn't get a method to connect to janusgraph using embedded mode using java. We are using Cassandra as a backend and cql to connect to it. Not sure how I will be achieving the following: 1. Connection to janusgraph from java in embedded mode with data already present in Cassandra(cql). 2. Is there any way to get the data from Cassandra into in-memory?? Please share blogs or other approaches to successfully test the above. Thanks & Regards, Vinayak
On Fri, Mar 12, 2021 at 9:38 PM <hadoopmarc@...> wrote: Hi Vinayak,
|
|
Re: JMX authentication for cassandra
Vinayak Bali
Hi Marc, The article was useful and complete the JMX authentication successfully. But when I allow password authentication for Cassandra by changing the following lines in Cassandra.yaml, it stops working. Before: authenticator: AllowAllAuthenticator authorizer: AllowAllAuthorizer After: authenticator: PasswordAuthenticator authorizer: CassandraAuthorizer # Authentication backend, implementing IAuthenticator; used to identify users # Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthenticator, # PasswordAuthenticator}. # # - AllowAllAuthenticator performs no checks - set it to disable authentication. # - PasswordAuthenticator relies on username/password pairs to authenticate # users. It keeps usernames and hashed passwords in system_auth.credentials table. # Please increase system_auth keyspace replication factor if you use this authenticator. # Authorization backend, implementing IAuthorizer; used to limit access/provide permissions # Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthorizer, # CassandraAuthorizer}. # # - AllowAllAuthorizer allows any action to any user - set it to disable authorization. # - CassandraAuthorizer stores permissions in system_auth.permissions table. Please # increase system_auth keyspace replication factor if you use this authorizer. The comments here suggest increasing replication factor, but I don't think that's the issue. Please suggest a blog or changes to be made to enable password authentication for Cassandra. Thanks & Regards, Vinayak
On Thu, Mar 11, 2021 at 9:58 PM <hadoopmarc@...> wrote: Hi Vinayak,
|
|
Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties
Abhay Pandit
Hi Naresh, I too used to get this exception. This was solved after moving to Janusgraph v0.5.2. Hope this helps you. Thanks, Abhay
On Sat, 13 Mar 2021 at 22:20, <hadoopmarc@...> wrote: Hi Naresh,
|
|
Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties
hadoopmarc@...
Hi Naresh,
Yes, elasticsearch, I should have recognized the "painless" scripting! This can mean the following things:
|
|
Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties
Naresh Babu Y
Hello Marc, Thanks for quick reply. Am not using gremlin server. Am using spark, and read all messages per batch Then open JanusGraph transaction add batch records and commit it. Here is the details.. JanusGraph version: 0.3.2 Storage system: Hbase Index : elastic Please let me know if you have any clue at JanusGraph transaction level/any configuration (because am not using gremlin server) Thanks, Naresh
On Sat, 13 Mar 2021, 9:54 pm , <hadoopmarc@...> wrote: Hi Naresh,
|
|
Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties
hadoopmarc@...
Hi Naresh,
I guess that the script that the error message refers to, is the script that your client executes remotely at gremlin server. You may want to study: https://tinkerpop.apache.org/docs/current/reference/#parameterized-scripts which, depending on how you coded the frequent updates, can dramatically diminish the time spent on script compilation by gremlin server. This is also what the exception messages means with "use indexed, or scripts with parameters instead". Best wishes, Marc
|
|
Re: Incomplete javadoc
hadoopmarc@...
Hi Boxuan,
Thanks for pointing this out. Now I can provide this link when needed. Then, there are still broken links to RelationIdentifier in janusgraph-core, see the last link in my original post. Best wishes, Marc
|
|
.JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties
Naresh Babu Y
Hi,
we are using janusgraph ( version 0.3.2) with elastic 6. when updating a node/vertex with property of LIST cardinality which is mixed index frequently getting below exception and data is not stored/updated. {type=illegal_argument_exception, reason=failed to execute script, caused_by={type=general_script_exception, reason=Failed to compile inline script
[if(ctx._source["property123"] == null) ctx._source["property123"] = [];ctx._source["property123"].add("jkkhhj#1");] using lang [painless], caused_by={type=circuit_breaking_exception, reason=[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting, bytes_wanted=0, bytes_limit=0}}}
we have requirement to update property of LIST type frequently, but changing max_compilations_rate to large number is not a good idea.please let me know, if any other option to handle this in janusgraph? Thanks, Naresh
|
|
Re: Count Query Optimization
Boxuan Li
Apart from rewriting the query, there are some config options (https://docs.janusgraph.org/basics/configuration-reference/#query) worth trying:
1) Turn on query.batch 2) Turn off query.fast-property
|
|
Re: Count Query Optimization
AMIYA KUMAR SAHOO
Hi Marc,
Vinayak query has a filter on inV property (property1 = B), hence I did not stop at edge itself. If this kind of query is frequent, decision can be made if the same value makes sense to keep duplicate at both vertex and edge. That will help eliminate the traversal to the out vertex. Regards, Amiya
|
|
Re: Incomplete javadoc
Boxuan Li
It’s available here: https://javadoc.io/doc/org.janusgraph/janusgraph-driver/latest/org/janusgraph/graphdb/relations/RelationIdentifier.html
toggle quoted messageShow quoted text
https://github.com/janusgraph/janusgraph/commit/b96aa6d26f74d3dc8f7404212bddeefe2d0790b4 moved this class from janusgraph-core to janusgraph-driver with the same package name. Best regards, Boxuan
|
|
Re: Threads are unresponsive for some time after a particular amount of data transfer(119MB)
hadoopmarc@...
Hi Vinayak,
As the link shows, the issue is an issue in TinkerPop, so it cannot be solved here. Of course, you can look for workarounds. As sending result sets of multiple hundreds of Mb is not a typical client operation, you might consider opening the graph in embedded mode, that is without using gremlin server. Best wishes, Marc
|
|
Incomplete javadoc
hadoopmarc@...
As of janusgraph 0.5.0 the RelationIdentifier seems to be missing from the javadoc:
https://janusgraph.org/apidocs/org/janusgraph/graphdb/relations/RelationIdentifier.html https://javadoc.io/doc/org.janusgraph/janusgraph-core/0.4.1/org/janusgraph/graphdb/relations/RelationIdentifier.html The RelationIdentifier sure still exists in the gremlin console and even on other latest javadoc pages: https://javadoc.io/doc/org.janusgraph/janusgraph-core/latest/org/janusgraph/graphdb/relations/RelationIdentifierUtils.html Am I right? Marc
|
|
Re: Count Query Optimization
hadoopmarc@...
Hi all,
I also thought about the vertex centrex index first, but I am afraid that the VCI can only help to filter the edges to follow, but it does not help in counting the edges. A better way to investigate is to leave out the final inV() step. So, e.g. you can count the number of distinct v2 id's with: g.V().has('property1', 'A').outE().has('property1','E').id().map{it.get().getOutVertexId()}.dedup().count() Note that E().id() returns RelationIdentifier() objects that contain both the edge id, the inVertexId and the OutVertexId. This should diminish the number of storage backend calls. Best wishes, Marc
|
|
Re: Count Query Optimization
AMIYA KUMAR SAHOO
Hi Vinayak, For query 1. What is the degree centrality of vertex having property A. How much percentage satisfy out edge having property E. If it is small, VCI will help to increase speed for this traversal. You can give it a try to below query, not sure if it will speed up. g.V().has('property1', 'A'). outE().has('property1','E'). inV().has('property1', 'B'). dedup().by(path()). count()
On Fri, 12 Mar 2021, 13:30 Vinayak Bali, <vinayakbali16@...> wrote:
|
|
Threads are unresponsive for some time after a particular amount of data transfer(119MB)
Vinayak Bali
Hi All, We are connecting to janusgraph using java. A cluster connection with the gremlin driver is used for the connectivity. At the start, we were getting out of memory error, but tweaking some changes in gremlin-server.yaml resolved the issue. The issue raised on StackOverflow: Changes made in gremlin-server.yaml: writeBufferLowWaterMark: 9500000 writeBufferHighWaterMark: 10000000 Every query gets stuck at 119 MB for some time i.e approx 5 mins and again starts working. Attaching a screenshot of the error. Gremlin server configurations: maxInitialLineLength: 4096 maxHeaderSize: 8192 maxChunkSize: 16384 maxContentLength: 2000000000 maxAccumulationBufferComponents: 1024 resultIterationBatchSize: 128 writeBufferLowWaterMark: 9500000 writeBufferHighWaterMark: 10000000 threadPoolWorker: 30 gremlinPool: 0 How can the issue be solved ?? Thanks & Regards, Vinayak
|
|