Date   

Re: JMX authentication for cassandra

hadoopmarc@...
 

Hi Vinayak,

This question is probably better addressed to:
https://cassandra.apache.org/community/

as I cannot remember having seen this discussed in the JanusGraph community.

Best wishes,

Marc


Re: Threads are unresponsive for some time after a particular amount of data transfer(119MB)

hadoopmarc@...
 

Hi Vinayak,

For embedded use of janusgraph, see:
https://docs.janusgraph.org/getting-started/basic-usage/#loading-with-an-index-backend
and replace the properties file with the one currently used by gremlin server.

With embedded use, you can simply do (if your graph is not too large):
vertices = g.V().toList()
edges = g.E().toList()
subGraph = g.E().subgraph('sub').cap('sub').next()

Best wishes,   Marc


Re: Threads are unresponsive for some time after a particular amount of data transfer(119MB)

Vinayak Bali
 

Hi Marc,

I went through some blogs but didn't get a method to connect to janusgraph using embedded mode using java. We are using Cassandra as a backend and cql to connect to it. Not sure how I will be achieving the following:
1. Connection to janusgraph from java in embedded mode with data already present in Cassandra(cql).
2. Is there any way to get the data from Cassandra into in-memory??
Please share blogs or other approaches to successfully test the above.

Thanks & Regards,
Vinayak

On Fri, Mar 12, 2021 at 9:38 PM <hadoopmarc@...> wrote:
Hi Vinayak,

As the link shows, the issue is an issue in TinkerPop, so it cannot be solved here. Of course, you can look for workarounds. As sending result sets of multiple hundreds of Mb is not a typical client operation, you might consider opening the graph in embedded mode, that is without using gremlin server.

Best wishes,   Marc


Re: JMX authentication for cassandra

Vinayak Bali
 

Hi Marc,

The article was useful and complete the JMX authentication successfully. But when I allow password authentication for Cassandra by changing the following lines in Cassandra.yaml, it stops working.

Before: 
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
After:
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer

# Authentication backend, implementing IAuthenticator; used to identify users
# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
#   users. It keeps usernames and hashed passwords in system_auth.credentials table.
#   Please increase system_auth keyspace replication factor if you use this authenticator.
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
#   increase system_auth keyspace replication factor if you use this authorizer.

The comments here suggest increasing replication factor, but I don't think that's the issue. Please suggest a blog or changes to be made to enable password authentication for Cassandra.

Thanks & Regards,
Vinayak 


Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties

Abhay Pandit
 

Hi Naresh,

I too used to get this exception. This was solved after moving to Janusgraph v0.5.2.

Hope this helps you.

Thanks,
Abhay


On Sat, 13 Mar 2021 at 22:20, <hadoopmarc@...> wrote:
Hi Naresh,

Yes, elasticsearch, I should have recognized the "painless" scripting! This can mean the following things:
  • your use case is maybe unusual, would it be possible to introduce a groupby step in spark that first gathers all property updates for a vertex into one update call?
  • the default value of script.max_compilations_rate may be really too low for your use case, so it is worth a try increasing it (elasticsearch docs do not discourage it). I think this should be done outside janusgraph, just using the elastic API's.
  • the janusgraph code for calling elasticsearch with scripts is suboptimal; I did not investigate this other than checking for existing issues (none). This option will not help you now; if you want to create an issue on janusgraph github, please specify how your system setup is, what the update rates are, etc. You would also have to check whether your issue also holds for janusgraph 0.4.1 or 0.5.3 because 0.3.x is end of life.
Best wishes,    Marc


Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties

hadoopmarc@...
 

Hi Naresh,

Yes, elasticsearch, I should have recognized the "painless" scripting! This can mean the following things:
  • your use case is maybe unusual, would it be possible to introduce a groupby step in spark that first gathers all property updates for a vertex into one update call?
  • the default value of script.max_compilations_rate may be really too low for your use case, so it is worth a try increasing it (elasticsearch docs do not discourage it). I think this should be done outside janusgraph, just using the elastic API's.
  • the janusgraph code for calling elasticsearch with scripts is suboptimal; I did not investigate this other than checking for existing issues (none). This option will not help you now; if you want to create an issue on janusgraph github, please specify how your system setup is, what the update rates are, etc. You would also have to check whether your issue also holds for janusgraph 0.4.1 or 0.5.3 because 0.3.x is end of life.
Best wishes,    Marc


Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties

Naresh Babu Y
 

Hello Marc,
Thanks for quick reply.

Am not using gremlin server. 

Am using spark, and read all messages per batch
Then open JanusGraph transaction add batch records and commit it.

Here is the details..
JanusGraph version: 0.3.2
Storage system: Hbase
Index : elastic


Please let me know if you have any clue at JanusGraph transaction level/any configuration (because am not using gremlin server)

Thanks,
Naresh


On Sat, 13 Mar 2021, 9:54 pm , <hadoopmarc@...> wrote:
Hi Naresh,

I guess that the script that the error message refers to, is the script that your client executes remotely at gremlin server. You may want to study:
https://tinkerpop.apache.org/docs/current/reference/#parameterized-scripts

which, depending on how you coded the frequent updates, can dramatically diminish the time spent on script compilation by gremlin server. This is also what the exception messages means with "use indexed, or scripts with parameters instead".

Best wishes,    Marc


Re: .JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties

hadoopmarc@...
 

Hi Naresh,

I guess that the script that the error message refers to, is the script that your client executes remotely at gremlin server. You may want to study:
https://tinkerpop.apache.org/docs/current/reference/#parameterized-scripts

which, depending on how you coded the frequent updates, can dramatically diminish the time spent on script compilation by gremlin server. This is also what the exception messages means with "use indexed, or scripts with parameters instead".

Best wishes,    Marc


Re: Incomplete javadoc

hadoopmarc@...
 

Hi Boxuan,

Thanks for pointing this out. Now I can provide this link when needed.

Then, there are still broken links to RelationIdentifier in janusgraph-core, see the last link in my original post.

Best wishes,    Marc


.JanusGraph/Elastic - Too many dynamic script compilations error for LIST type properties

Naresh Babu Y
 

Hi,
we are using janusgraph ( version 0.3.2) with elastic 6.

when updating a node/vertex with property of LIST cardinality which is mixed index frequently getting below exception and data is not stored/updated.
{type=illegal_argument_exception, reason=failed to execute script, caused_by={type=general_script_exception, reason=Failed to compile inline script 
[if(ctx._source["property123"] == null) ctx._source["property123"] = [];ctx._source["property123"].add("jkkhhj#1");] using lang [painless], caused_by={type=circuit_breaking_exception, reason=[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting, bytes_wanted=0, bytes_limit=0}}}

we have requirement to update property of LIST type frequently, but changing max_compilations_rate to large number is not a good idea.

please let me know, if any other option to handle this in janusgraph?

Thanks,
Naresh


Re: Count Query Optimization

Boxuan Li
 

Apart from rewriting the query, there are some config options (https://docs.janusgraph.org/basics/configuration-reference/#query) worth trying:

1) Turn on query.batch
2) Turn off 
query.fast-property


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Marc,

Vinayak query has a filter on inV property (property1 = B), hence I did not stop at edge itself.

If this kind of query is frequent, decision can be made if the same value makes sense to keep duplicate at both vertex and edge. That will help eliminate the traversal to the out vertex.

Regards,
Amiya


Re: Incomplete javadoc

Boxuan Li
 

On Mar 12, 2021, at 11:56 PM, hadoopmarc@... wrote:



Re: Threads are unresponsive for some time after a particular amount of data transfer(119MB)

hadoopmarc@...
 

Hi Vinayak,

As the link shows, the issue is an issue in TinkerPop, so it cannot be solved here. Of course, you can look for workarounds. As sending result sets of multiple hundreds of Mb is not a typical client operation, you might consider opening the graph in embedded mode, that is without using gremlin server.

Best wishes,   Marc


Incomplete javadoc

hadoopmarc@...
 


Re: Count Query Optimization

hadoopmarc@...
 

Hi all,

I also thought about the vertex centrex index first, but I am afraid that the VCI can only help to filter the edges to follow, but it does not help in counting the edges. A better way to investigate is to leave out the final inV() step. So, e.g. you can count the number of distinct v2 id's with:
g.V().has('property1', 'A').outE().has('property1','E').id().map{it.get().getOutVertexId()}.dedup().count()

Note that E().id() returns RelationIdentifier() objects that contain both the edge id, the inVertexId and the OutVertexId. This should diminish the number of storage backend calls.

Best wishes,    Marc


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

For query 1.

What is the degree centrality of vertex having property A. How much percentage satisfy out edge having property E. If it is small, VCI will help to increase speed for this traversal.

You can give it a try to below query, not sure if it will speed up.

g.V().has('property1', 'A').
    outE().has('property1','E').
    inV().has('property1', 'B').
    dedup().by(path()).
    count()



On Fri, 12 Mar 2021, 13:30 Vinayak Bali, <vinayakbali16@...> wrote:
Hi All,

The schema consists of A, B as nodes, and E as an edge with some other nodes and edges. 
A: 183468
B: 437317
E: 186513

Query:  g.V().has('property1', 'A').as('v1').outE().has('property1','E').as('e').inV().has('property1', 'B').as('v2').select('v1','e','v2').dedup().count()
Output: 200166
Time Taken: 1min

Query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count())
Output: ==>[vetexCount:383633,edgeCount:200166]
Time: 3.5 mins
Property1 is the index.
How can I optimize the queries because minutes of time for count query is not optimal. Please suggest different approaches. 

Thanks & Regards,
Vinayak


Threads are unresponsive for some time after a particular amount of data transfer(119MB)

Vinayak Bali
 

Hi All,

We are connecting to janusgraph using java. A cluster connection with the gremlin driver is used for the connectivity. At the start, we were getting out of memory error, but tweaking some changes in gremlin-server.yaml resolved the issue.
The issue raised on StackOverflow: 


Changes made in gremlin-server.yaml:
writeBufferLowWaterMark: 9500000
writeBufferHighWaterMark: 10000000
Every query gets stuck at 119 MB for some time i.e approx 5 mins and again starts working.
Attaching a screenshot of the error.

Gremlin server configurations:

maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 16384
maxContentLength: 2000000000
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 128
writeBufferLowWaterMark: 9500000
writeBufferHighWaterMark: 10000000
threadPoolWorker: 30
gremlinPool: 0

How can the issue be solved ??

Thanks & Regards,
Vinayak


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

hadoopmarc@...
 

Hi Saurabh,

The workaround you found has implications for performance. So, if you can describe how  to reproduce the issue, you can make an issue for it on https://github.com/JanusGraph/janusgraph/issues
This might easily be a scenario that is not covered by the current janusgraph tests.

I see that "id" is indeed a property key in the schema. I assume your input data already had this "id" property and this was not generated by janusgraph. In the former case does replacing "id" by "user_id" make any difference (this will make your queries more readable anyway)?

Best wishes,    Marc


Count Query Optimization

Vinayak Bali
 

Hi All,

The schema consists of A, B as nodes, and E as an edge with some other nodes and edges. 
A: 183468
B: 437317
E: 186513

Query:  g.V().has('property1', 'A').as('v1').outE().has('property1','E').as('e').inV().has('property1', 'B').as('v2').select('v1','e','v2').dedup().count()
Output: 200166
Time Taken: 1min

Query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count())
Output: ==>[vetexCount:383633,edgeCount:200166]
Time: 3.5 mins
Property1 is the index.
How can I optimize the queries because minutes of time for count query is not optimal. Please suggest different approaches. 

Thanks & Regards,
Vinayak

921 - 940 of 6656