Slowing of janusgraph


51kumarakhil@...
 
Edited

Hi, i've three janusgraph (0.5.3) servers pointing to same bigTable with same configurations mentioned below. 

Configurations:
I'm using ConfigurationManagementGraph, below is the properties file

--------------------------------<janusgraph-bigtable-configurationgraph.properties>----------------------------------

gremlin.graph=org.janusgraph.core.ConfiguredGraphFactory

storage.backend=hbase
storage.hbase.ext.google.bigtable.instance.id=
storage.hbase.ext.google.bigtable.project.id=
storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
graph.timestamps=MICRO
storage.lock.wait-time=100

graph.graphname=ConfigurationManagementGraph
storage.hostname=127.0.0.1

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

--------------------------------------------------------------------------------------------------------------------------------------------


And gremlin-server.yaml file looks like this
-------------------------------gremlin-server.yaml---------------------------------------------------------------------------------------------

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 100000
channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  graph: conf/janusgraph-inmemory.properties,
  ConfigurationManagementGraph: conf/janusgraph-bigtable-configurationgraph.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}
}}}


--------------------------------------------------------------------------------------------------------------------------------------------------------------




All the servers are sharing the same configurations. Now i've setup a nginx also on top of these servers. So, when I've to create a graph I create a connection first with nginx and nginx connects with the most available janusgraph server and that server creates a graph for me and stores it in the bigTable. 
I can now access this graph from any of the three servers.

Till now everything is working as expected.

Issue1:  Slowing of Servers

everyday I generate a new graph with around 150K vertex and 250K edges. For the first time servers generate the graph but later on it slows down the execution and at one moment it stops completely. It wont process anything it, gets stuck in between and don't take any request. So to solve this i've to restart the servers everytime. Which leads to second issue

Issue2: Deleting of graphs

Like when servers are running (all of them). In that time if a graph is created then I can also delete it. But the moment i restart any of the server then i'm not able to delete the graph, the server which get restarted continuously throws error "Table Not Found". So to resolve this too, I've to stop all the servers first and delete all the graph from bigtable then restart the servers again. But again after first graph creation by all the servers, it leads to Issue1 again.




hadoopmarc@...
 

Have you tried to increase JVM memory settings for JanusGraph Server? Also, check the settings for the database cache size, because I tend to remember that each graph has its own cache. Can you provide some numbers: number of graphs, database cache settings, JVM memory settings, etc.

Best wishes,    Marc


51kumarakhil@...
 

Hi Thanks Marc for the quick reply, below are the details you asked.

Max. Heap Size (Estimated): 3.85G

No of graphs: 7 to 8

Database-cache:
       cache.db-cache = true
       cache.db-cache-clean-wait = 20
       cache.db-cache-time = 180000
       cache.db-cache-size = 0.5
 




hadoopmarc@...
 

Hmm, these figures seem perfectly reasonable. It seems I was also wrong about the db-cache heap region not being shared between graphs. So you have to gather more information about was is going wrong. Some ideas:

  1. does restarting nginx make a difference? Communication between JanusGraph server and gremlin clients runs over websockets. Make sure there is no issue with communication channels kept occupied.
  2. https://lists.lfaidata.foundation/g/janusgraph-users/topic/79935654#2886 This thread gives an nginx setup using a DNS virtual IP address. Possibly this makes a difference in websocket communication issues.
  3. disable the db-cache and see if this make a difference (this can only make a difference if there are many clients with transaction caches)
  4. check the logs of janusgraph server for other indications

Best wishes,      Marc