Changing graphname at runtime


Diglio A. Simoni
 

Note: I cross-posted this to https://groups.google.com/g/gremlin-users so that I can reach a broader audience, so if you're a member of both groups, please receive my apologies!

Hello,
 
I have a situation where I have a consumer API that talks to JanusGraph. It's configured to connect to a a ConfiguredGraphFactory graph with graphname "Graph_A". I want to update Graph_A in *its totality*, which implies dropping it and recreating it from scratch. The problem is that such a process takes a long time, and I don't want the system to be down while Graph_A is being rebuilt. So I have another ConfiguredGraphFactory graph with graphname Graph_B. I take whatever time is required to create Graph_B, and when it's done, I need to stop the system, change the configuration of the API to now connect to Graph_B and restart the system.
 
But I don't want to do that.
 
Instead what I'd like to do is: when Graph_B is ready, I would ConfiguredGraphFactory.drop('Graph_A') and *rename* Graph_B to Graph_A.
 
Is that possible? If not, does anybody have another solution? This is akin to what one does in computer graphics and double buffering....
 


hadoopmarc@...
 

Is this what you are looking for (it includes an explicit example):

https://docs.janusgraph.org/basics/configured-graph-factory/#updating-configurations

You can version your graph in the storage and indexing backends, but keep the graph name facing the end user the same.

Best wishes,    Marc


Diglio A. Simoni
 

OK, so that I'm clear, what you're suggesting is that I try something like:

// Create and open main graph

map = new HashMap();

map.put("storage.backend", “hbase);

map.put("storage.hostname", “xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz”);

map.put("storage.hbase.table”, “TABLE_A”);

map.put("graph.graphname", “GRAPH”);

configuration = new MapConfiguration(map);

configuration.setDelimiterParsingDisabled(True);

ConfiguredGraphFactory.createConfiguration(configuration);

graph = ConfiguredGraphFactory.open(“GRAPH”);

 

// Create, open and update replacement graph

map = new HashMap();

map.put("storage.backend", “hbase);

map.put("storage.hostname", “xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz”);

map.put("storage.hbase.table”, “TABLE_B”);

map.put("graph.graphname", “GRAPH_TEMP”);

configuration = new MapConfiguration(map);

configuration.setDelimiterParsingDisabled(True);

ConfiguredGraphFactory.createConfiguration(configuration);

graph = ConfiguredGraphFactory.open(“GRAPH_TEMP”);

 

// Modify GRAPH_TEMP and when it’s time to make that the live one:

map = new HashMap();

map.put("storage.hbase.table”, “TABLE_B);

ConfiguredGraphFactory.updateConfiguration(“GRAPH”,map);

graph = ConfiguredGraphFactory.open(“GRAPH”);


But that raises some additional questions:
  • Do I need to ConfiguredGraphFactory.close(GRAPH) before I update its configuration?
  • What happens to GRAPH_TEMP? Wouldn't it be still pointing to the same storage backend HBase table as GRAPH, i.e. to TABLE_B?
  • if I want to reuse the same scheme, I'd have to have some logic that the next time around I need to renew GRAPH, I have GRAPH_TEMP talk to TABLE_A instead and then switch GRAPH to use TABLE_A, correct?


hadoopmarc@...
 

You really have to try this out and see. I can only answer from what I read in the ref docs.

> Do I need to ConfiguredGraphFactory.close(GRAPH) before I update its configuration?
The docs say the binding between graph name and graph instance renews every 20 secs, so maybe this is not necessary.

> What happens to GRAPH_TEMP? Wouldn't it be still pointing to the same storage backend HBase table as GRAPH, i.e. to TABLE_B?
GRAPH_TEMP is just a name in the JanusGraphManager memory. It does not matter.

> if I want to reuse the same scheme, I'd have to have some logic that the next time around I need to renew GRAPH, I have GRAPH_TEMP talk to TABLE_A instead and then switch GRAPH to use TABLE_A, correct?
You are right. I would prefer straight versioning or a timestamp in the tablename, or the reuse of names will bite you some day. Of course, you would drop TABLE_A from the storage backend if not needed anymore.

Best wishes,   Marc