Concurrent TimeoutException on connection to gremlin server remotely
sarthak...@...
Hi, I have a Gremlin Server Running v3.3.3 I am connecting to it remotely to run my gremlin queries via Java. But recently I'm bombarded with this error `org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists` Initially, when I faced this issue, I used to restart gremlin service and it used to work again but this that doesn't solve the problem anymore. I'm not sure what is the issue here Here is my remote-objects.yaml file ``` hosts: [fci-graph-writer-gremlin] port: 8182 serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry,org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0] }} connectionPool: { channelizer: Channelizer.WebSocketChannelizer, maxContentLength: 81928192 } ``` gremlin-server.yaml ``` host: 0 port: 8182 scriptEvaluationTimeout: 120000 threadPoolWorker: 4 gremlinPool: 16 channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer graphs: { fci: conf/janusgraph-hbase.properties, insights: conf/janusgraph-insights-hbase.properties } scriptEngines: { gremlin-groovy: { plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {}, org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {}, org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}, org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]}, org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}, scripts: [scripts/empty-sample.groovy], staticImports: ['org.opencypher.gremlin.process.traversal.CustomPredicates.*']}} serializers: - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry,org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0] }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }} # Older serialization versions for backwards compatibility: - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }} - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }} processors: - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }} - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }} - { className: org.opencypher.gremlin.server.op.cypher.CypherOpProcessor, config: { sessionTimeout: 28800000}} metrics: { consoleReporter: {enabled: true, interval: 180000}, csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv}, jmxReporter: {enabled: false}, slf4jReporter: {enabled: false, interval: 180000}, gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST}, graphiteReporter: {enabled: false, interval: 180000}} threadPoolBoss: 1 maxInitialLineLength: 4096 maxHeaderSize: 8192 maxChunkSize: 8192 maxContentLength: 81928192 maxAccumulationBufferComponents: 1024 resultIterationBatchSize: 64 writeBufferLowWaterMark: 32768 writeBufferHighWaterMark: 65536 ssl: { enabled: false} ``` pom.xml ``` <dependency> <groupId>org.janusgraph</groupId> <artifactId>janusgraph-all</artifactId> <version>0.3.1</version> </dependency> <dependency> <groupId>org.apache.tinkerpop</groupId> <artifactId>gremlin-driver</artifactId> <version>3.3.3</version> </dependency> <dependency> <groupId>org.apache.tinkerpop</groupId> <artifactId>tinkergraph-gremlin</artifactId> <version>3.3.3</version> </dependency> |
|||||||||||
|
|||||||||||
Stephen Mallette <spmal...@...>
It's hard to say what the problem could be given your description. All I can say given the information you've provided is that the driver has marked all of your hosts as "dead" for some reason. That could have happened for any number of reasons. Assuming you knew the server was running and are certain of network stability, then I guess I'd next look at server logs to see what kinds of errors were occurring just prior to the driver returning this error. As an aside, I see that you're using 3.3.3 for the TinkerPop driver and I don't remember exactly what conditions triggered a "dead" host back then. I'm pretty sure there have been some refinements to that decision making in more recent releases. On Sat, Oct 19, 2019 at 5:55 AM <sarthak...@...> wrote:
|
|||||||||||
|
|||||||||||
sarthak...@...
Hi Stephen, Thanks for replying. I was able to solve this issue but I'm not certain how this was causing an error. I'm connecting to `remote-objects.yaml` file from my java code. I recently added a property ```
}
Topic: ``` After removing this, the system is again back to normal. I got this value from Doc: http://tinkerpop.apache.org/docs/3.2.9/reference/#connecting-via-remotegraph Connecting via Java > Configuration |
|||||||||||
|
|||||||||||
Stephen Mallette <spmal...@...>
Yeah....I think the documentation misled you a bit. You perhaps took that "Default" written there very literally without considering the "Description" which describes that field as: "The fully qualified classname of the client Channelizer that defines how to connect to the server." So, "Channelizer.WebSocketChannelizer" is really just the class name, not the FQCN. There wasn't enough room to put that whole thing in that "Default" column. If you'd done: connectionPool: { channelizer: org.apache.tinkerpop.gremlin.driver.Channelizer$WebSocketChannelizer } it would be working. What isn't so good is that you didn't get an error message for that configuration problem. On Mon, Oct 21, 2019 at 8:59 AM <sarthak...@...> wrote:
|
|||||||||||
|
|||||||||||
sarthak...@...
Ok. Got it. Thanks Stephen. I just have another query. Maybe you can help. So I have multiple graphs which I have defined them in my `gremlin-server.yaml` ```
} And to initialise them at run time, I have mentioned these graph in my empty-sample.groovy ```
``` The issue here is, sometimes this g2 doesn't get initialised at runtime.. and if I restart my server again.. it works.. The only difference between these properties file is storage.hbase.table What could be the reason behind this random behaviour? |
|||||||||||
|
|||||||||||
Stephen Mallette <spmal...@...>
I'm reading this as a JanusGraph-hbase specific sort of question and I'm not sure what the issue might be there. When you say "g2" doesn't get initialized, do you get an error in the server startup? or is it some other kind of error? On Mon, Oct 21, 2019 at 10:33 AM <sarthak...@...> wrote:
|
|||||||||||
|
|||||||||||
sarthak...@...
Well I don't have the error message right now. I'll post it when I'll run in that scenario again but the thing is the gremlin server starts. But graph isn't initialised. i.e. g2 isn't available for querying data. And after restarting the service, with the same properties, it works fine. So, I couldn't understand this random behaviour. Also, should the value in `storage.hbase.table` name in properties file provided to gremlin-server.yaml be already created in hbase table?? We first start the service then insert the data in hbase which created the table and then query the results. |
|||||||||||
|
|||||||||||
sarthak...@...
Hi Stephen,
Below is the error I get for g2 ```
|
|||||||||||
|
|||||||||||
Stephen Mallette <spmal...@...>
That seems to be the failure as a result of a request - is that right? I'm wondering if there is an error at server startup when the script executes that you're missing? Or do the startup logs look clean? On Mon, Oct 21, 2019 at 2:38 PM <sarthak...@...> wrote: Hi Stephen, |
|||||||||||
|
|||||||||||
sarthak...@...
That's all the log I have right now. The startup looks clean to me. And this isn't a failure of a request. At least no request from client (our) side. If gremlin or hbase is sending any request, like a ping to verify connection, then I'm not sure about that. But what could be the reason of this random behaviour? If the properties were wrong, then it shouldn't start or configure at any point. But it is just random. |
|||||||||||
|
|||||||||||
Stephen Mallette <spmal...@...>
hmm, i'm not sure how the HttpGremlinEndpointHandler would log an error if it didn't get a request. nothing in TinkerPop that I can think of would issue a request to that. we don't even recommend that folks use that really. anyway, that aside, i'm not aware of situations where Gremlin Server will have a successful init script run only to later lose a global binding. it's actually "hard" to get rid of a reference bound to the ScriptEngine once it's in there. I guess I would try to do some debugging in the init script to try to figure out what's happening. Perhaps, verify that "g2" actually works at time of init? Like, maybe: g =
graph2.traversal() ctx.logger.info("found a vertex: " + g.V().limit(1).next()) globals << [g2: g,g1:graph1.traversal()] On Tue, Oct 22, 2019 at 8:50 AM <sarthak...@...> wrote:
|
|||||||||||
|