Date   

Re: P.neq() predicate uses wrong ES mapping

hadoopmarc@...
 

Hi Sergej,

The example string "??" you used was not an ordinary string. Apparently, somewhere in elasticsearch it is interpreted as a wildcard.  See my transcript below with some other property value and the index behaves according to your and my expectations. I made some attempts to escape the question marks in your example string like "\\?, but was not successful. The janusgraph documentation is very quiet on the use of wildcards for indexing backends.

Best wishes,   Marc

bin/janusgraph.sh start
bin/gremlin.sh
graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')

mgmt = graph.openManagement()
index = mgmt.buildIndex("indexname", Vertex.class)
xproperty = mgmt.makePropertyKey("x").dataType(String.class).make();
yproperty = mgmt.makePropertyKey("y").dataType(String.class).make();
index.addKey(xproperty, Mapping.TEXTSTRING.asParameter())
index.addKey(yproperty, Mapping.TEXTSTRING.asParameter())
index.buildMixedIndex("search")
mgmt.commit()
ManagementSystem.awaitGraphIndexStatus(graph, 'indexname').status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call()
==>GraphIndexStatusReport[success=true, indexName='indexname', targetStatus=[REGISTERED, ENABLED], notConverged={}, converged={x=ENABLED, y=ENABLED}, elapsed=PT0.017S]

g = graph.traversal()
g.addV('Some').property('x', 'x1').property('y', 'y1')
g.addV('Some').property('x', 'x2').property('y', '??')
g.tx().commit()


Expected behaviour:
g.V().has("x","x1").has("y",P.neq("y1"))
===>
g.V().has("x","x1").has("y",P.eq("y1"))
==>v[4224]
g.V().has("x","x1").has("y",P.neq("y4"))
==>v[4224]

Undocumented behaviour:
g.V().has("x","x2").has("y",P.neq("??"))
==>v[4264]
g.V().has("x","x2").has("y",P.eq("??"))
==>v[4264]
g.V().has("x","x2").has("y",P.neq("y4"))
==>v[4264]


FoundationDB adapter working status

cedtwo
 

Hi. Back in January I opened an issue on the FDB adapter github page to bring attention to issues I had following the README.md. Despite framing the question as issues following the documentation, I was left feeling that the adapter was just not functional with the versions of Janusgraph/FDB stated in the compatibility matrix. I put off working on this side of the app for the last few months, hoping a response would eventually assist in resolving the issues, however coming back I find I have yet to receive one, and the adapter remains just as I left it. Can anyone clarify if the adapter is still supported and functional? Or should I consider another storage backend?

Thanks guys.


Re: Best way to load exported medium-sized graphs

carlos.bobed@...
 

Hi Marc, 

from the gremlin console, I get: 
.with(IO.reader, IO.graphml).read().iterate()] - Batch too large
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Batch too large
at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:184)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:110)
at org.apache.tinkerpop.gremlin.console.Console$_executeInShell_closure19.doCall(Console.groovy:419)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:405)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2246)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2226)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:2276)
at org.codehaus.groovy.runtime.dgm$199.doMethodInvoke(Unknown Source)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.Console.executeInShell(Console.groovy:396)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:163)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502)


While getting in the server log, on the one side a lot of warnings such as: 
1692178 [JanusGraph Cluster-nio-worker-1] WARN com.datastax.driver.core.RequestHandler - Query '[20 stateme
nts, 80 bound values] BEGIN UNLOGGED BATCH INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:key,
:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:ke
y,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES (:
key,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column1,value) VALUES
(:key,:column1,:value) USING TIMESTAMP :timestamp; INSERT INTO janusgraph.edgestore (key,column... [truncated
output]' generated server side warning(s): Batch of prepared statements for [janusgraph.edgestore] is of siz
e 7626, exceeding specified threshold of 5120 by 2506.


And finally an exception about a temporary failure in storage backend: 
1692180 [gremlin-server-session-1] INFO org.janusgraph.diskstorage.util.BackendOperation - Temporary except
ion during backend operation [CacheMutation]. Attempting backoff retry.
42047 org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
42048 at io.vavr.API$Match$Case0.apply(API.java:3174)
42049 at io.vavr.API$Match.of(API.java:3137)
42050 at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:
123)
42051 at org.janusgraph.diskstorage.cql.CQLStoreManager.mutateManyUnlogged(CQLStoreManager.java:526)
42052 at org.janusgraph.diskstorage.cql.CQLStoreManager.mutateMany(CQLStoreManager.java:457)
42053 at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(Expe
ctedValueCheckingStoreManager.java:79)
42054 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:94)
42055 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91)
42056 at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
42057 at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
42058 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91)
42059 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.ja
va:133)
42060 at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.mutate(CacheTransaction.java:86)
42061 at org.janusgraph.diskstorage.keycolumnvalue.cache.KCVSCache.mutateEntries(KCVSCache.java:65)
42062 at org.janusgraph.diskstorage.BackendTransaction.mutateEdges(BackendTransaction.java:200)
42063 at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:628)
42064 at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:731)
42065 at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1438)
42066 at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlu
eprintsGraph.java:297)
42067 at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:10
4)
42068 at org.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.readGraph(GraphMLReader.java:132)
42069 at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.IoStep.read(IoStep.java:132)
42070 at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.IoStep.processNextStart(IoStep.java
:110)
42071 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:14
3)
42072 at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableSte
pIterator.java:50)
42073 at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.
java:37)
42074 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)
42075 at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)
42076 at org.apache.tinkerpop.gremlin.process.traversal.Traversal.iterate(Traversal.java:207)


... 
caused by: 

42095 Caused by: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.InvalidQueryException
: Batch too large
42096 at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
42097 at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
42098 at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
42099 at io.vavr.control.Try.of(Try.java:62)
42100 at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199)
42101 ... 5 more
42102 Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
42103 at com.datastax.driver.core.Responses$Error.asException(Responses.java:181)
42104 at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:215)
42105 at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
42106 at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:61)
42107 at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:10
11)
42108 at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:814)
42109 at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1262)
42110 at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1180)
42111 at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
42112 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.jav
a:377)


I'll keep the logs this tiem just in case more detail is required. 

Best, 

Carlos 


Re: P.neq() predicate uses wrong ES mapping

sergeymetallic@...
 

Hi Marc,

something like this

var index  = janusGraphManagement.
buildIndex(
"indexname", org.apache.tinkerpop.gremlin.structure.Vertex.class)
var xproperty = janusGraphManagement.makePropertyKey("x").dataType(String.class).make();
var yproperty = janusGraphManagement.makePropertyKey("y").dataType(String.class)
.make();

index.addKey(xproperty, Mapping.TEXTSTRING.asParameter())
index.addKey(yproperty, Mapping.TEXTSTRING.asParameter())
 


Re: Best way to load exported medium-sized graphs

hadoopmarc@...
 

Hi Carlos,

I read the preceding discussion with Stephen Mallette, which says: From the logs, while loading this graph, Cassandra driver is almost always warning that all batches are over the limit of 5120 (which I haven't found yet where to modify ...).
A complete stacktrace would help indeed, but it strikes me that 5120 equals 20 x 256 while the cassandra cql drivers have the following defaults:

storage.cql.batch-statement-size The number of statements in each batch Integer 20
storage.cql.remote-max-requests-per-connection   The maximum number of requests per connection for remote datacenter   Integer 256


Best wishes,    Marc


Best way to load exported medium-sized graphs

cbobed <cbobed@...>
 

Hi all,

I'm trying to load a graphml export into janusgraph 0.5.3. not quite big (1.68M nodes, 8.8M edges). However, I reach a point where Tinkerpop layer tells me that the Batch is too large and it crashes (I suspect that there might be an edge with far too much information ... but finding it is difficult).

I've tried to split the graphml file into non-overlapping partitions, but janusgraph does not seem to honor the ID (I'm not actually sure whether this might be omitted at Tinkerpop or Janusgraph level), and when I reach the splitted edges part, it inserts new nodes for both source and target.

Has anyone faced this graphml loading problem before? Should I try to get a GraphSON translated version of my exported graph to run everything more smoothly?

What are the recommendations to deal with this kind of loadings? I'm trying to avoid implementing my own graph loader at this point.

Thank you very much in advance,

Best,

Carlos Bobed


Re: P.neq() predicate uses wrong ES mapping

hadoopmarc@...
 

Hi Sergey,

I think I see your point, but for completeness can you be explicit on step 1) and specify your mgmt.buildIndex() statements?

Best wishes, 

Marc


Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated

hadoopmarc@...
 

Hi Toom,

OK, I tried two things. I start with the janusgraph-full-0.5.3 distribution and run (this has gremlin server running with conf/janusgraph-cql-es.properties):
$ bin/janusgraph.sh start

Now I start a gremlin console and I do:

1. graph = JanusGraphFactory.open('conf/janusgraph-cql-es2.properties')
In this case I made a copy of onf/janusgraph-cql-es.properties') and gave elasticsearch a non-existing ip address. This gives the following stacktrace:

17:32:19 WARN  org.janusgraph.diskstorage.es.rest.RestElasticSearchClient  - Unable to determine Elasticsearch server version. Default to SEVEN.
java.net.ConnectException: Connection refused
    at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:823)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
    at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient.getMajorVersion(RestElasticSearchClient.java:137)
    at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient.<init>(RestElasticSearchClient.java:117)
    at org.janusgraph.diskstorage.es.rest.RestClientSetup.getElasticSearchClient(RestClientSetup.java:107)
    at org.janusgraph.diskstorage.es.rest.RestClientSetup.connect(RestClientSetup.java:75)
    at org.janusgraph.diskstorage.es.ElasticSearchSetup$1.connect(ElasticSearchSetup.java:51)
    at org.janusgraph.diskstorage.es.ElasticSearchIndex.interfaceConfiguration(ElasticSearchIndex.java:445)
    at org.janusgraph.diskstorage.es.ElasticSearchIndex.<init>(ElasticSearchIndex.java:332)
   (...)
    at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
    at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:168)
    at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234)
    at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
    at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
    at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
    at java.lang.Thread.run(Thread.java:748)
Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

This sounds like the stacktrace you encountered. It does show that janusgraph uses the local properties file. The GLOBAL OFFLINE looks more like a warning that you cannot use heterogenous indexing backend configs and get consistent results. Are you sure your corrected configs are right?

2. graph = JanusGraphFactory.open('conf/janusgraph-cql.properties')
This simply opens the graph without using the indexing backend. So, it is possible to still open the graph.

Best wishes,

Marc


Re: JanusGraph-specific Predicates via Gremlin Language Variants (Python)

Florian Hockmann
 

Hi,

the problem is probably simply that the Python driver doesn’t support serialization for JanusGraph predicates with GraphBinary. So, you would need to write your GraphBinary serializer for JanusGraph predicates in Python. janusgraph-python could provide support for this in the future but right now it only supports GraphSON and even that is not in a final state right now.

 

There is currently not much progress in janusgraph-python. So, if you want to help improve the JanusGraph support in Python, then any contributions would of course be very welcome.

 

Von: janusgraph-users@... <janusgraph-users@...> Im Auftrag von florian.caesar via lists.lfaidata.foundation
Gesendet: Freitag, 9. April 2021 09:27
An: janusgraph-users@...
Betreff: [janusgraph-users] JanusGraph-specific Predicates via Gremlin Language Variants (Python)

 

Hi, I'm trying to use JanusGraph's full-text predicates in the gremlin-python client library. Using GraphSON serializer, I can use a predicate with e.g. "textContains" as the operator and it works since JanusGraphIoRegistryV1d0 registers its custom deserializer for P objects:

addDeserializer(P.class, new JanusGraphPDeserializerV2d0());

However, as of v0.5.3, JanusGraph does not register any deserializers for GraphBinary (though that feature is already on the master branch). This means that when I submit the same exact traversal with P("textContains", "string") in graphbinary format I get:

org.apache.tinkerpop.gremlin.server.handler.OpSelectorHandler  - Invalid OpProcessor requested [null]

I presume this is because the "textContains" predicate isn't registered. Weirdly enough, in my Groovy console, the same traversal works fine even though it also uses graphbinary (according to the configuration).

There are a couple options here and I don't have enough information on any of them, so I would appreciate input:

    1. Figure out what the Groovy console is doing differently and use that in the Python library
    2. Use a Docker image from master branch and adapt the Python library to use the new custom JanusgraphP type in graphbinary
    3. Use two separate clients with different serializations depending on which traversal I need to run (yuck)

Note: I've already tested https://github.com/JanusGraph/janusgraph-python, it does the same thing I do manually and thus only works with GraphSON.


Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated

toom@...
 

Thank you for your reply. In my case I can remove the database and create a new one. But what should I do if I want to retrieve data from a JanusGraph database in which the configured index backend is not available ?

Is there any way to disable index backend without instantiating the database ? Or to make the index errors not fatal during the instantiation (in order to change configuration) ?

Toom.


P.neq() predicate uses wrong ES mapping

sergeymetallic@...
 

Janusgraph setup:
Storage backend: Scylla 3
Indexing backend: Elasticsearch 6
JG version: 0.5.3

Steps to reproduce:

1) Create a vertex with two fields mapped in ES index as TEXTSTRING("x" and "y")
2) Insert a node with values: x="anyvalue", y="??"
3) Execute these queries:
  • g.V().has("x","anyvalue").has("y",P.neq("??"))
  • g.V().has("x","anyvalue").has("y",P.eq("??"))

Expected result:
  • First query returns an empty set
  • Second query returns one node

Actual result:
  • Both queries return the same result


Observation:
Looks like the issue is in this line https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/ElasticSearchIndex.java#L959
Code checks for Cmp.EQUAL but not for Cmp.NOT_EQUAL, so that in case of NOT_EQUAL tokenized field is used


JanusGraph-specific Predicates via Gremlin Language Variants (Python)

florian.caesar
 

Hi, I'm trying to use JanusGraph's full-text predicates in the gremlin-python client library. Using GraphSON serializer, I can use a predicate with e.g. "textContains" as the operator and it works since JanusGraphIoRegistryV1d0 registers its custom deserializer for P objects:

addDeserializer(P.class, new JanusGraphPDeserializerV2d0());

However, as of v0.5.3, JanusGraph does not register any deserializers for GraphBinary (though that feature is already on the master branch). This means that when I submit the same exact traversal with P("textContains", "string") in graphbinary format I get:

org.apache.tinkerpop.gremlin.server.handler.OpSelectorHandler  - Invalid OpProcessor requested [null]

I presume this is because the "textContains" predicate isn't registered. Weirdly enough, in my Groovy console, the same traversal works fine even though it also uses graphbinary (according to the configuration).

There are a couple options here and I don't have enough information on any of them, so I would appreciate input:

    1. Figure out what the Groovy console is doing differently and use that in the Python library
    2. Use a Docker image from master branch and adapt the Python library to use the new custom JanusgraphP type in graphbinary
    3. Use two separate clients with different serializations depending on which traversal I need to run (yuck)

Note: I've already tested https://github.com/JanusGraph/janusgraph-python, it does the same thing I do manually and thus only works with GraphSON.


Re: ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application has been killed. Reason: All masters are unresponsive! Giving up.

hadoopmarc@...
 

Hi Vinayak,

Your properties file says:

spark.master=spark://127.0.0.1:7077

Do you have a spark standalone cluster running? Does the spark master reside on 127.0.0.1 and does it listen on 7077?

With spark on localhost, you can also simply take the "read-cql.properties¨ which uses all cores on localhost for running spark executors.

Best wishes,    Marc


Re: Count Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

For other readers, see also this other recent thread.

A couple of remarks:
  • In the separate edge count does it make any difference if you select the edges by label rather than by property, so g.E().hasLabel('Edge2').dedup().count() ? You can see in the JanusGraph data model that the edge label is somewhat easier to access than its properties.
  • If you use an indexing backend, it is also possible to do some simple counts against the index, but this will not help you out for your original query.
  • You also asked about using Spark. Most of the time, OLAP performance is (still) disappointing. But if you need more details you will have to show what you have tried and what problems you encountered.

Best wishes,     Marc


ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Application has been killed. Reason: All masters are unresponsive! Giving up.

Vinayak Bali
 

Hi All, 

gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().has('title','Plant').count()
11:09:18 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
11:09:20 WARN  org.apache.spark.util.Utils  - Your hostname, ip-xx-xx-xx-xx resolves to a loopback address: 127.0.0.1; using xx.xx.xx.xx instead (on interface ens5)
11:09:20 WARN  org.apache.spark.util.Utils  - Set SPARK_LOCAL_IP if you need to bind to another address
11:10:25 ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application has been killed. Reason: All masters are unresponsive! Giving up.
11:10:25 WARN  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application ID is not initialized yet.
11:10:25 WARN  org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint  - Drop UnregisterApplication(null) because has not yet connected to master
11:10:25 WARN  org.apache.spark.metrics.MetricsSystem  - Stopping a MetricsSystem that is not running
11:10:26 ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52)
at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
11:10:26 ERROR org.apache.spark.scheduler.AsyncEventQueue  - Listener AppStatusListener threw an exception
java.lang.NullPointerException
at org.apache.spark.status.AppStatusListener.onApplicationEnd(AppStatusListener.scala:157)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
java.lang.NullPointerException
Type ':help' or ':h' for help.
Display stack trace? [yN]

Hadoop: 3.3.0
Spark: 2.2.2
Scala: 2.11.2
Janusgraph: 0.5.2

Referred the below documentation for configuration files: 

Thanks & Regards,
Vinayak


Re: How to change GLOBAL_OFFLINE configuration when graph can't be instantiated

hadoopmarc@...
 

Hi,
JanusGraph keeps a record of open instances that sometimes is not updated properly. You can clean it with the methods here:
https://docs.janusgraph.org/advanced-topics/recovery/#janusgraph-instance-failure

Maybe it is no problem if your graph is dropped entirely, so you can also check:
https://docs.janusgraph.org/basics/common-questions/#dropping-a-database
After dropping, the graph can be recreated with the right configs.

Best wishes,     Marc


How to change GLOBAL_OFFLINE configuration when graph can't be instantiated

toom@...
 

Hi,

I am in a case where the index backend has been incorrectly configured to Elasticsearch. Now, when I try to instantiate my graph database, I get a "ConnectException: Connection refused", even if I set('index.search.backend', 'lucene') in JanusGraphFactory.
The setting index.backend is global offline, I should update it when the graph is instantiated but how can I change it if the instantiation fails ?

Thanks


Count Query Optimisation

Vinayak Bali
 

Hi All, 

The Data Model of the graph is as follows:

Nodes:

Label: Node1, count: 130K
Label: Node2, count: 183K
Label: Node3, count: 437K
Label: Node4, count: 156

Relations:

Node1 to Node2 Label: Edge1, count: 9K
Node2 to Node3 Label: Edge2, count: 200K
Node2 to Node4 Label: Edge3, count: 71K
Node4 to Node3 Label: Edge4, count: 15K
Node4 to Node1 Label: Edge5 , count: 1K

The Count query used to get vertex and edge count :

g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count())

This query takes around 3.5 mins to execute and the output returned is as follows:
[{"vertexCount":383633,"edgeCount":200166}]

The problem is traversing the edges takes more time.
g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes.
g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges

In some cases, subsequent calls are faster, due to cache usage. 
I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance?

Please help me to improve the performance, count query should not take much time.

Janusgraph : 0.5.2
Storage: Cassandra cql
The server specification is high and that is not the issue.

Thanks & Regards,
Vinayak


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 


Re: Traversal binding of dynamically created graphs are not propagated in multi-node cluster

hadoopmarc@...
 

Hi Anton,

No, my last post only concerned the gremlin server on port 8185, although the
first line of step3 should have been (This was a hand edit error):
    :remote connect tinkerpop.server conf/remote8185.yaml session
The gremlin server on port 8182 from janusgraph.sh is ignored.

Anyway, the link to the succesful test on github actually held the key to some
more insight. It turns out that our issue (bindings are not automatically
generated after max 20 seconds) is absent if you use the sequence
createTemplateConfiguration() and create(). Unfortunately, this only holds on
the same server where the new configuration was created.

So, I will report this all as an issue and you can comment on it if necessary.

Best wishes,    Marc

841 - 860 of 6656