Another perspective on JanusGraph embedded versus server mode


Jamie Lawson <jamier...@...>
 

We have a _domain specific_ REST API that is architecturally decoupled from JanusGraph. In other words, users of the REST API have no indication that their calls interact with JanusGraph, or even with a graph. These REST calls have a lot of interactions with the JanusGraph database which is currently embedded in the same JVM process. Here is a deployment view:


+-----------------------------------------+    +-----------------+
| JVM Process #1                          |    | JVM Process #2  |
|                                         |    |                 |
|  +-----------------+    +------------+  |    |  +-----------+  |
|  | Domain Specific |----| JanusGraph |--+----+--| Cassandra |  |
|  |    REST API     |    |  Embedded  |  |    |  |  Backend  |  |
|  +-----------------+    +------------+  |    |  +-----------+  |
|                                         |    |                 |
+-----------------------------------------+    +-----------------+


Now consider load balancing. The REST API is the only way we want to access the graph database. That's what keeps it "operationally consistent". If all updates are through the REST API, we will not get stuff in the database that doesn't make sense in the context of the domain. As we expand, is there a good reason to break out JVM Process #1 so that we have something that looks like this, with JanusGraph Server in a separate process:


+----------------------+    +-----------------+    +-----------------+
| JVM Process #1A      |    | JVM Process #1B |    | JVM Process #2  |
|                      |    |                 |    |                 |
|  +-----------------+ |    | +------------+  |    |  +-----------+  |
|  | Domain Specific |-+----+-| JanusGraph |--+----+--| Cassandra |  |
|  |    REST API     | |    | |   SERVER   |  |    |  |  Backend  |  |
|  +-----------------+ |    | +------------+  |    |  +-----------+  |
|                      |    |                 |    |                 |
+----------------------+    +-----------------+    +-----------------+

My expectation would be that connecting to JanusGraph through the embedded API would be much faster than connecting through a WebSocket API. Is that the case?

Now as we expand, is it reasonable to run our REST endpoint with an embedded JanusGraph in the same process and replicate that process with all of the embedded JanusGraphs talking to the same Cassandra backend, something like this:


+-----------------------------------------+
| JVM Process #1.1 on Node #1             |
|                                         |
|  +-----------------+    +------------+  |
|  | Domain Specific |----| JanusGraph |--+--------------+
|  | REST API endpt 1|    |  Embedded  |  |              |
|  +-----------------+    +------------+  |              |
|                                         |              |
+-----------------------------------------+              |
                                                         |
+-----------------------------------------+    +^^^^^^^^^|^^^^^^^+
| JVM Process #1.2 on Node #2             |    { Cluster Process }
|                                         |    {         |       }
|  +-----------------+    +------------+  |    {  +-----------+  }
|  | Domain Specific |----| JanusGraph |--+----+--| Cassandra |  }
|  | REST API endpt 2|    |  Embedded  |  |    {  |  Backend  |  }
|  +-----------------+    +------------+  |    {  +-----------+  }
|                                         |    {         |       }
+-----------------------------------------+    +^^^^^^^^^|^^^^^^^+
                                                         |
+-----------------------------------------+              |
| JVM Process #1.3 on Node #3             |              |
|                                         |              |
|  +-----------------+    +------------+  |              |
|  | Domain Specific |----| JanusGraph |--+--------------+
|  | REST API endpt 3|    |  Embedded  |  |
|  +-----------------+    +------------+  |
|                                         |
+-----------------------------------------+


The real question here is, if different embedded JanusGraphs have the same backend, do they describe the same graph (modulo eventual consistency)? I expect that they will have different stuff in cache, but will they describe the same graph?

And is there an expectation of a performance advantage if we break out the JanusGraph part and separate it from the REST API (running as JanusGraph Server), understanding that all interaction with the graph will be through the REST API, given that each REST call may make a number of sequential JanusGraph (Gremlin) calls?

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.