Best (Correct?) Way to Access JanusGraph Remotely


zbec...@...
 

Hello, I am in the process of building a Java, RPC server that takes in multi-threaded requests, queries the database and returns a result to the client. I understand JanusGraph allows concurrent transactions and I would prefer all requests run within their own transaction. However, I am a bit confused as to access JanusGraph remotely as I have seen multiple implementations.

The first uses:
JanusGraph graph = JanusGraphFactory.build()
     
.set("storage.backend", "hbase")
     
.set("storage.hostname", "lily.local")
     
.set("storage.hbase.table", "birdr")
     
.open();

GraphTraversalSource g = graph.traversal();

Then I would call the following in each request block of the RPC server:
g.tx().createThreadedTx()


But this appears to building a JanusGraph server. Is that correct?

Another implementation I have seen is:

Cluster.open('path to config file');

And then I would call the following in each request block of the RPC server:

Client client = cluster.connect();
GraphTraversalSource g = AnonymousTraversalSource.traversal().withRemote(DriverRemoteConnection.using(client, "g"));

Then simply close the client at the end of each request. However this GraphTraversalSource does not support transactions,

I believe the second option is more correct, but do Client instances maintain transactions? Are they transactions themselves?

Any kind of explanation or clarity would be greatly appreciated.


HadoopMarc <bi...@...>
 

Hi,

I think you need the first use, assuming you want the clients of your RPC server not to be gremlin-aware.. Then you use JanusGraph in the embedded way.

The second use is when you do not roll your own graph application endpoint but rather use the Gremlin Server application to connect with gremlin-aware clients.

HTH,    Marc

Op dinsdag 11 augustus 2020 om 15:25:33 UTC+2 schreef zb...@...:

Hello, I am in the process of building a Java, RPC server that takes in multi-threaded requests, queries the database and returns a result to the client. I understand JanusGraph allows concurrent transactions and I would prefer all requests run within their own transaction. However, I am a bit confused as to access JanusGraph remotely as I have seen multiple implementations.

The first uses:
JanusGraph graph = JanusGraphFactory.build()
     
.set("storage.backend", "hbase")
     
.set("storage.hostname", "lily.local")
     
.set("storage.hbase.table", "birdr")
     
.open();

GraphTraversalSource g = graph.traversal();

Then I would call the following in each request block of the RPC server:
g.tx().createThreadedTx()


But this appears to building a JanusGraph server. Is that correct?

Another implementation I have seen is:

Cluster.open('path to config file');

And then I would call the following in each request block of the RPC server:

Client client = cluster.connect();
GraphTraversalSource g = AnonymousTraversalSource.traversal().withRemote(DriverRemoteConnection.using(client, "g"));

Then simply close the client at the end of each request. However this GraphTraversalSource does not support transactions,

I believe the second option is more correct, but do Client instances maintain transactions? Are they transactions themselves?

Any kind of explanation or clarity would be greatly appreciated.


Zach Becker <zbec...@...>
 

What do you mean by "gremlin-aware?" Both implementations will give me a graph traversal which I'll use to run gremlin queries against the databases


HadoopMarc <bi...@...>
 

Hi,

We are talking about the following scenarios:

client ----------- Gremlin Server ------------ storage backend
with the client firing remote gremlin traversals ("gremlin-aware")


client ----------- own server(embedded JanusGraph) ------------- storage backend
with the client firing general requests to be translated into gremlin traversals at your own server


client ----------- own server ----------- Gremlin Server ------------ storage backend
with the client firing general requests and the own server firing remote gremlin traversals

I assumed the last 3-tier scenario would not be efficient and thus not what you want.

Regarding transactions:
gremlin-clients apply a transaction per request (forgetting about the string based requests that the TinkerPop people advise against). However, add and drop steps that you want to apply in a single transaction can simply be concatenated inside a single traversal.

Regarding threads in the gremlin client:
Different threads can use the same connection, see https://tinkerpop.apache.org/docs/current/reference/#_configuration
The multithreaded transactions you mention do not apply to remote bytecode traversals. Why would you need these (I could only imagine very time sensitive applications)?

Best wishes,   Marc


Op dinsdag 11 augustus 2020 om 17:22:22 UTC+2 schreef zb...@...:

What do you mean by "gremlin-aware?" Both implementations will give me a graph traversal which I'll use to run gremlin queries against the databases