Date   

Re: Authentication in JanusGraph Server

hadoopmarc@...
 

Hi Graham,

This was certainly one to investigate for the weekend. Where you started investigating from the inside of janusgraph, I started from the user perspective and this is what I did:
  1. I replicated your steps on janusgraph-full-0.5.3 and hit the same issue (incorrect username/password)
  2. I also replicated your steps on janusgraph-0.3.2 to be sure no bugs were introduced in later versions, but still the same issue
  3. I checked the old user list and found https://groups.google.com/g/janusgraph-users/c/iVqlUS2zQbc/m/vmf8PgEQBAAJ  This was interesting: someone had problems with the credentialsDb and only got it working after switching from a Berkeleyje backend to an HBase backend. This was a pattern: your issue also was with Berkeleyje
  4. In the authentication section of the gremlin-server.yaml I changed the properties file for the credentialsDb to one using cql-es with a keyspace "credentials" and... remote authentication worked
This was a nasty one, but the effort you had already taken inspired me to do my part. I will make an issue report for this on github.

Best wishes,   Marc


Re: How to circumvent transaction cache?

hadoopmarc@...
 

Hi Timon,

Adding to the answer of Ted, I can imagine that your new data enter your pipeline from a Kafka queue. With a microbatching solution, e.g. Apache Spark streaming, you could pre-shuffle your data per microbatch to be sure that all data relating to a branch are in a single partition. After that, a single thread can handle this single partition in one JanusGraph transaction. This approach seems fit better to your use case that trying to circumvent ACID limits in a tricky way.

Best wishes,    Marc


Re: How to circumvent transaction cache?

Boxuan Li
 

Hi Timon,

As I mentioned earlier, the only way I can think of (assuming you are not concerned about the consistency of data storage as Ted mentioned) is to modify JanusGraph source code:

In CacheVertex class, there is a data structure, protected final Map<SliceQuery, EntryList> queryCache.

What you could do is to add a method to that class:

public void refresh() {
    queryCache.clear();
}

And then you can call refresh before you want to load new value from the storage rather than cache:

((CacheVertex) v1).refresh();

Hope this helps,
Boxuan


On Mar 6, 2021, at 12:32 AM, Ted Wilmes <twilmes@...> wrote:

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.




Re: How to circumvent transaction cache?

Ted Wilmes
 

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.


Re: How to circumvent transaction cache?

timon.schneider@...
 

Thanks for your suggestion, but the consistency setting does not solve my problem.


Re: How to circumvent transaction cache?

Nicolas Trangosi <nicolas.trangosi@...>
 

Hi Simon,
It seems that you can force JG to re-read elements just before commit according to

I have never try the option mgmt.setConsistency but this may help you.

Regards,
Nicolas

Le ven. 5 mars 2021 à 10:20, <timon.schneider@...> a écrit :

[Edited Message Follows]

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A starts transaction and makes edits, does not commit yet
B makes an edit to vertex X and commits
A cannot see B's edit to vertex X unless A commits or rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon



--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   




Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.


Re: How to circumvent transaction cache?

timon.schneider@...
 
Edited

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A reads vertex X and then starts transaction and makes edits, does not commit yet
B may or may not edit X
A continues editing and before committing it needs to makes sure vertex X was not changed by B or else rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon


Re: Authentication in JanusGraph Server

grahamwallis.dev@...
 

Hi @hadoopmarc,

Thanks for replying and no apology needed - it's a good question. Although I failed to mention it in my question, I did set the credentials to ('graham','sass-password') in the sasl-remote.yaml file when testing with the JanusGraph as credentials store.

Setting a breakpoint in the server I could see the correct credentials being received, and the credentials store traversal looked fine; but no vertex is returned.

All the best
  Graham


Re: how to delete Ghost vertices and ghost edges?

Boxuan Li
 


「<vamsi.lingala@...>」在 2021年3月4日 週四,下午4:42 寫道:

gremlin> g.V(6389762617560).valueMap()
==>{}
gremlin>
gremlin> g.V().hasLabel("MAID").has("madsfid","sfmsdlk").outE("MAIH1").as("e").inV().as("v").select("e", "v").by(valueMap())
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}


Re: How to circumvent transaction cache?

Boxuan Li
 

Hi Timon,

I don’t even think you will be able to disable tx-cache by using createThreadedTx(), or equivalently, newTransaction()/buildTransaction(). Unfortunately, as long as your transaction is not readOnly(), the effective vertex transaction size will be Math.max(100, cache.tx-cache-size).

To my best knowledge, you can only modify JanusGraph source code to completely disable transaction level cache. A workaround would be to always start a new transaction to check whether the value has changed.

Best regards,
Boxuan

「<timon.schneider@...>」在 2021年3月3日 週三,下午9:11 寫道:

Our application has transactions editing many vertices representing elements of a branch. This branch is also represented by a vertex that has boolean property isPublished. Before committing such a transaction, we need to know whether another user set the isPublished property on the branch vertex to true, in which case the transaction should be rolled back.

Here’s the problem:
* User A reads the branch vertex but doesn’t close transaction
* User B changes the isPublished property to true and commits (while A is still making changes)
* User A read locks the vertex with an external locking API
* User A queries the branch vertex again (to make sure isPublished is still false) in the same thread but gets the old values because of the transaction cache.
Now user A can commit data even though the branch isPublished is true.

I know it’s possible to use createThreadedTx() to circumvent the ThreadLocal transaction cache. However, such refreshes will be very common in our application and ideally we would be able to execute a refresh within the main transaction to minimise complexity and workarounds. Is this possible? And if not, are there any possibilities to turn off transaction cache entirely?

Thanks in advance,
Timon


how to delete Ghost vertices and ghost edges?

vamsi.lingala@...
 

gremlin> g.V(6389762617560).valueMap()
==>{}
gremlin>
gremlin> g.V().hasLabel("MAID").has("madsfid","sfmsdlk").outE("MAIH1").as("e").inV().as("v").select("e", "v").by(valueMap())
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}


Re: Gremlin Query to return count for nodes and edges

Vinayak Bali
 

Hi Marc,

The backend used is Cassandra. I was just wondering if we can load the data from Cassandra's data store to the in-memory backend to speed up the process.
I tried OLAP by configuring Hadoop and Spark with the help of references shared in the documentation. A simple query to retrieve 1 node from the graph took around 5 mins. 
Based on your experience, request to share the steps to be followed to solve the issue.

Thanks & Regards,
Vinayak

On Wed, Feb 24, 2021 at 9:32 PM <hadoopmarc@...> wrote:
Hi Vinayak,

Speeding up your query depends on your setup. 15.000 vertices/second is already fast. Is this the janusgraph inmemory backend? Or ScyllaDB?

In a perfect world, not there yet, your query would profit from parallelization (OLAP). JanusGraph supports both the withComputer() and withComputer(SparkGraphComputer) start steps, but the former is undocumented and the performance gains of the latter are often disappointing.

Best wishes,    Marc


Re: Authentication in JanusGraph Server

hadoopmarc@...
 

Sorry for asking, but you did not state it explicitly: you did modify your sasl-remote.yaml file to reflect the new ('graham', 'sasl-password') credentials, did you?

Marc


Authentication in JanusGraph Server

Graham Wallis <grahamwallis.dev@...>
 

Hi, 

I've been trying to use authentication over a websocket connection to a JanusGraph Server. 

If I configure the server to use a SimpleAuthenticator and a TinkerGraph for the credentials, as described in the Tinkerpop documentation, it works. 

In this mode, my gremlin-server.yaml is configured for authentication as follows: 

authentication: { 
  authenticator: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator, 
  authenticationHandler: org.apache.tinkerpop.gremlin.server.handler.SaslAuthenticationHandler, 
  config: { 
    credentialsDb: conf/tinkergraph-credentials.properties 
  } 

where the tinkergraph-credentials.properties file is the same as the example from Tinkerpop:

gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph 
gremlin.tinkergraph.vertexIdManager=LONG 
gremlin.tinkergraph.graphLocation=data/credentials.kryo 
gremlin.tinkergraph.graphFormat=gryo 

My gremlin-server.yaml also has the following SSL configuration: 

ssl: { 
  enabled: true, 
  sslEnabledProtocols: [TLSv1.2], 
  keyStore: server.jks, 
  keyStorePassword: mykeystore 

I've created a self-signed certificate for localhost, added it to the server.jks keystore (with the key password the same as the store password). 
Because my client (console) is on the same machine as the server, I used the server.jks keystore as the truststore for the client, and created 
a sasl-remote.yaml file for the client, with the following: 

hosts: [localhost] 
port: 8182 
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }} 
username: stephen 
password: password 
connectionPool: { 
  enableSsl: true, 
  sslEnabledProtocols: [TLSv1.2], 
  trustStore: server.jks, 
  trustStorePassword: mykeystore 
} 
I can start a gremlin-console and connect to the server, using the credentials ("stephen", "password"). 

:remote connect tinkerpop.server conf/sasl-remote.yaml session 

and subsequent remote operations against my (real) graph succeed. 

The above all works nicely. I can step through the invocation of SimpleAuthenticator's authenticate() method in the server in the debugger and it does exactly what you'd expect. 


If I try to do the same using a JanusGraph DB to store the credentials I can't get the client to authenticate. 

I tried using the following janusgraph-credentials-server.properties file for my credentials store: 

gremlin.graph=org.janusgraph.core.JanusGraphFactory 
storage.backend=berkeleyje 
storage.directory=../cred/berkeley 

And changed my gremlin-server yaml as follows: 

authentication: { 
  authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.JanusGraphSimpleAuthenticator, 
  authenticationHandler: org.apache.tinkerpop.gremlin.server.handler.SaslAuthenticationHandler, 
  config: { 
    defaultUsername: graham, 
    defaultPassword: sasl-password, 
    credentialsDb: conf/janusgraph-credentials-server.properties 
  } 

The ../cred/berkeley database is created during start of the gremlin-server. If I subsequeently stop the server and open the credentials database using a gremlin-console (locally) I can see that the default user has been added to it, the vertex is correctly labelled (as 'user') and the username and (hashed) password match. So the credentials store looks OK. 

However, if I now create a connection to the server and try to perform an remote operation, it doesn't authenticate and always results in "Username and/or password are incorrect". 

Stepping through the server code in the debugger, I noticed that the JanusGraphSimpleAuthenticator authenticate() method is never called, because the handler calls the SimpleAuthenticator's authenticate() method directly. This is probably fine as the former delegates to the latter anyway. But when the SimpleAuthenticator's authenticate() actually performs the credentials traversal, it does not find the user. 

I wondered whether I shuold be using a JanusGraph specific authentication handler, but that doesn't look like it would help; for a websocket connection the SaslAndHMACAuthentiucationHandler will delegate to the channelRead method of its superclass, i.e. SaslAuthenticationHandler, which is the same as the above. The only difference I can see in the code is that the SimpleAuthenticator is using a Tinkerpop generic Graph to create its CredentialTraversalSource, whereas the JanusGraphSimpleAuthenticaor uses a JanusGraph. 

Please can anyone can see what I'm doing wrong? 


Best regards,
 Graham

Linux Foundation LFAIData
Project: Egeria


How to circumvent transaction cache?

timon.schneider@...
 

Our application has transactions editing many vertices representing elements of a branch. This branch is also represented by a vertex that has boolean property isPublished. Before committing such a transaction, we need to know whether another user set the isPublished property on the branch vertex to true, in which case the transaction should be rolled back.

Here’s the problem:
* User A reads the branch vertex but doesn’t close transaction
* User B changes the isPublished property to true and commits (while A is still making changes)
* User A read locks the vertex with an external locking API
* User A queries the branch vertex again (to make sure isPublished is still false) in the same thread but gets the old values because of the transaction cache.
Now user A can commit data even though the branch isPublished is true.

I know it’s possible to use createThreadedTx() to circumvent the ThreadLocal transaction cache. However, such refreshes will be very common in our application and ideally we would be able to execute a refresh within the main transaction to minimise complexity and workarounds. Is this possible? And if not, are there any possibilities to turn off transaction cache entirely?

Thanks in advance,
Timon


Re: Not able to reindex with bigtable as backend

hadoopmarc@...
 

The vertex centric index is written to the storage backend, so I guess the section on write performance configs should be relevant:
https://docs.janusgraph.org/advanced-topics/bulk-loading/#optimizing-writes-and-reads

If have no idea whether row locking plays a role in writing the vertex centric index. If so, the config properties you mention are relevant and maybe also the config for batch loading, which disables locking:
https://docs.janusgraph.org/advanced-topics/bulk-loading/#batch-loading

Id allocation does not seem relevant (it has its own error messages so you would notice).

Marc


Re: Not able to reindex with bigtable as backend

liqingtaobkd@...
 

Thanks a lot for your reply Marc. I browsed through the older threads and didn't find a good solution for this. 

"BigTable cannot keep up with your index repair workers" - could you provide a little bit insights for what an index repair job does, or any documentation?
I was trying a few storage settings and didn't get any luck yet: storage.write-time/storage.lock.wait-time/storage.lock.expiry-time/etc. Do you think it will make a difference? 

As you suggested, I'll try delete the index and retry from start.
For our application, we do need to have the option of reindexing current data, so I'll need to make sure it works. Do you see similar issue for Cassandra? We deploy it on GCP so we try Bigtable first.
Do you have any recommendation on backend storage for GCP please?


Re: Not able to reindex with bigtable as backend

hadoopmarc@...
 

I checked on the existing issues and the following one looks similar to your issue:
https://github.com/JanusGraph/janusgraph/issues/1803

There are also some older questions in the janusgraph users list. Only workaround seems to be to define the index before adding the data.

Best wishes,     Marc


Re: Not able to reindex with bigtable as backend

hadoopmarc@...
 

The stacktraces you sent are not from reindexing but from an index repair job. TemporaryBackendException is usually an indication of unbalanced distributed system components; apparently BigTable cannot keep up with your index repair workers. Is it still possible to delete the index and retry from the start?

Otherwise, you could try if reindexing works with just a small graph. There is little to go on right now.

Best wishes,    Marc


Re: ConfiguredGraphFactory and Authentication not working

Jansen, Jan