Date   

Re: Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

hadoopmarc@...
 

Hi Pawan,

Caused by: com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: Unauthorized. User graph_user has no CREATE permission on <all keyspaces> or any of its parents
CREATE KEYSPACE IF NOT EXISTS graphDataTable WITH replication={'replication_factor':1,'class':'SimpleStrategy'}
This part of your stacktrace suggests that something changed on your Cassandra cluster or with the graph_user. So, check the CREATE permission, with your admin or try to create a keyspace with nodetool.

Also note that a few changes in cql properties had to be made (but probably not related):
https://docs.janusgraph.org/v0.6/changelog/

Best wishes,     Marc


Re: GraphTraversal Thread Stuck

Boxuan Li
 

Hi Sujay,

I am not sure about the root cause (it might be a JanusGraph bug or a Datastax CQL driver bug), but you could try the JanusGraph 0.6.0 version and disable the `storage.cql.executor-service.enabled` option (https://docs.janusgraph.org/configs/configuration-reference/#storagecqlexecutor-service), which does not use an internal thread pool for CQLStoreManager as opposed to 0.5.3. If the problem still exists, I would argue it is more likely to be a bug with the Datastax CQL driver.

Best,
Boxuan


GraphTraversal Thread Stuck

Sujay Bothe <ssbothe3@...>
 

Hello,


Janus version -  0.5.3
Cassandra version - 3.11.4


I am facing one issue where the GraphTraversal hasNext() call got stuck .
The thread from which the Traversal was invoked is still stuck and below is the stackTrace for it.


"MYTHREAD" #80 prio=5 os_prio=0 tid=0x00007f74f82f9000 nid=0x1ab9 in Object.wait() [0x00007f746a8ec000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at io.vavr.concurrent.FutureImpl$$Lambda$226/250860313.run(Unknown Source)
        at io.vavr.control.Try.run(Try.java:105)
        at io.vavr.concurrent.FutureImpl.await(FutureImpl.java:114)
        - locked <0x00000000c42f2e60> (a java.lang.Object)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.interruptibleWait(CQLKeyColumnValueStore.java:308)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:289)
        at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:76)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:97)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:94)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
        at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration.get(KCVSConfiguration.java:94)
        at org.janusgraph.graphdb.tinkerpop.JanusGraphVariables.get(JanusGraphVariables.java:46)
        at MyObjectStrategy.apply(MyObjectStrategy.java:411)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversalStrategies.applyStrategies(DefaultTraversalStrategies.java:88)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.applyStrategies(DefaultTraversal.java:124)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:196)
        at MyTraversal.get(MyTraversal.java:300)


As you can see above, the CQLKeyColumnValueStore is waiting on the future object. 
It has been stuck for a couple of days now.

I went through the code of CQLKeyColumnValueStore.java of JanusGraph.
It is executing the sliceQuery using the CQLStoreManager.java's executorService threadpool.
The thread names for above thread pool starts with 'CQLStoreManager'


I checked the state of all  20 CQLStoreManager threads in the my java process and all of them are in PARKING state

CQLStoreManager[00]" #191278 daemon prio=5 os_prio=0 tid=0x00007f73f837e000 nid=0x3f0a waiting on condition [0x00007f73cd46c000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c3c26ff8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


So I am not able to understand that what happened to the CQLStoreManager thread which was suppose to update the future object on which
CQLKeyColumnValueStore.interruptibleWait(CQLKeyColumnValueStore.java:308) is waiting.

One more additional information that I want to share is that this has happened around the time at which one of the cassandra nodes was disconnected
and the traversal was failing with TemporaryBackendException (Enough replicas not available) .
But afterwards the node got added to the cluster and quorum was reached , still the thread remained stuck.

Can someone please help here ?
Do let me know if any additional information is needed. 


Thanks,
Sujay Bothe


Janusgraph upgrade 0.5.2 --> 0.6.0 | CQL issue

Pawan Shriwas
 

Hi Everyone,

I am facing one issue with janusgraph version 0.6.0 where I am upgrading my janusgraph console/server/embedded from 0.5.2 to 0.6.0 version.  Please see below stacktrace for the same. 

Please note that the same graph configuration was working on 0.5.2 version.

Suggest how can i recover it.

Stacktrack - 
5858 [main] WARN  org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager  - Graph [graph] configured at [/etc/opt/janusgraph/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server.  GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
        at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:84)
        at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:72)
        at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:106)
        at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.addGraph(DefaultGraphManager.java:63)
        at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
        at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:84)
        at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:124)
        at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:87)
        at org.janusgraph.graphdb.server.JanusGraphServer.start(JanusGraphServer.java:85)
        at org.janusgraph.graphdb.server.JanusGraphServer.main(JanusGraphServer.java:53)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:80)
        ... 14 more
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:79)
        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:525)
        at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:489)
        at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:64)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:176)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:147)
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:127)
        ... 19 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:73)
        ... 25 more
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Couldn't initialize CQLStoreManager
        at org.janusgraph.diskstorage.cql.CQLStoreManager.<init>(CQLStoreManager.java:155)
        at org.janusgraph.diskstorage.cql.CQLStoreManager.<init>(CQLStoreManager.java:116)
        ... 30 more
Caused by: com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: Unauthorized. User graph_user has no CREATE permission on <all keyspaces> or any of its parents
CREATE KEYSPACE IF NOT EXISTS graphDataTable WITH replication={'replication_factor':1,'class':'SimpleStrategy'}
^^^^^^
 (ql error -4)
        at com.datastax.oss.driver.api.core.servererrors.UnauthorizedException.copy(UnauthorizedException.java:49)
        at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
        at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
        at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
        at org.janusgraph.diskstorage.cql.CQLStoreManager.initializeKeyspace(CQLStoreManager.java:191)
        at org.janusgraph.diskstorage.cql.CQLStoreManager.<init>(CQLStoreManager.java:142)
        ... 31 more


Thanks,
Pawan


Re: Different query languages

Mladen Marović
 

Hi,

thanks for the info, I'll check that out.

Kind regards,

Mladen Marović


Re: Different query languages

rngcntr
 

Hi Mladen!

The first thing that comes to my mind when reading this is CyperForGremlin. However, as this project only translates Cyper queries into Gremlin queries, I do not expect a significant difference in performance. For users however, I can imagine it being quite helpful to be able to write queries in a language which they are familiar with. Concerning native support of other query languages, I am not aware of any future plans. This does not mean that we would strictly rule out support for other languages, but I think that this would require major rewrites across the entire code base.


Different query languages

Mladen Marović
 

Hello,

one of the reasons I started using Janusgraph a few years ago was because of its support for Tinkerpop, which is supposed to be a vendor-agnostic graph computing framework. However, multiple different graph databases appeared since then (with each being "the best one yet", of course), but Tinkerpop/Gremlin adoption doesn't seem to be as I expected. There are now many different languages for querying graphs (openCypher, Gremlin, GSQL, SPARQL...) with possibly different niches and without a clear standard like, for example, SQL in relational databases. This is quite a problem for a user that wants to try out different databases and benchmark them for a specific use case, and maybe even has a lot of these benchmarking scripts prepared.

Are there any plans to support other graph query languages besides Gremlin in Janusgraph? Is there maybe a plugin or some other type of support for this that I'm not aware of yet?

Thanks in advance,

Mladen Marović


Query Generation

Vinayak Bali
 

Hi All, 


Request you to check the problem posted on StackOverflow and share your feedback.

Thanks & Regards,
Vinayak


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Mladen Marović
 

Hi Boxuan,

thanks for the confirmation, no need to apologize. I created the related issue at https://github.com/JanusGraph/janusgraph/issues/2833 and will submit a pull request after some more testing.

Kind regards,

Mladen Marović


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Boxuan Li
 

Hi Mladen,

I can confirm this is indeed a bug. I apologize that when writing the test, I didn't notice the second open call was just returning the cached instance. Thank you for reporting and fixing it! It would be great if you could create an issue and a pull request for it.

Best regards,
Boxuan


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

Mladen Marović
 

Hello,

I ran the test CQLConfiguredGraphFactoryTest.templateConfigurationShouldSupportMultiHosts() and it was indeed successful. However, the test itself does not check the every-day scenario where it should be normal for Janusgraph instances to restart, graphs to be closed, etc. Let me demonstrate.

The test contains the following lines:

ConfiguredGraphFactory.createTemplateConfiguration(getTemplateConfigWithMultiHosts());
final StandardJanusGraph graph = (StandardJanusGraph) ConfiguredGraphFactory.create("graph1");
final StandardJanusGraph graph1 = (StandardJanusGraph) ConfiguredGraphFactory.open("graph1");
assertNotNull(graph);
assertEquals(graph, graph1);

ConfiguredGraphFactory.create() fetches a configuration template, fills some additional properties, opens the graph using this new configuration object (very important), and then stores the configuration object using:

configManagementGraph.createConfiguration(ConfigurationUtil.loadMapConfiguration(templateConfigMap));  // Very suspicious!!!

The next line in the test, ConfiguredGraphFactory.open(), attempts to open that graph, but since it was already opened in the previous create() call, JanusGraphManager will simply return the already successfully opened instance.

However, it seems to me that the problem appears in the configuration saved at the end of the create() call (the suspicious line) because it forces list parsing. Therefore, the stored configuration is different from the one used to open the graph for the first time, but it will be used in all other open() calls, e.g. in my case after restarting the Janusgraph instance. To prove this, I changed the test only slightly, to close the graph before opening it again:

ConfiguredGraphFactory.createTemplateConfiguration(getTemplateConfigWithMultiHosts());
final StandardJanusGraph graph = (StandardJanusGraph) ConfiguredGraphFactory.create("graph1");
graph.close();
final StandardJanusGraph graph1 = (StandardJanusGraph) ConfiguredGraphFactory.open("graph1");
// assertNotNull(graph);
// assertEquals(graph, graph1);
assertNotNull(graph1);

This test crashes when trying to open graph1 after the close().

Since list parsing is forced when opening the graph both as part of the create() call and in the open() call, the configuration object at the end of the create() call should be stored as is, without any changes. After changing the suspicious line to:

configManagementGraph.createConfiguration(new MapConfiguration(templateConfigMap));

the test passed without issues.

I haven't tried building a custom Janusgraph release to test this change fully in my case yet, but I suspect that it should be OK. If someone else can confirm that this is indeed a bug, I'll be happy to open an issue on GitHub and submit a PR.

Kind regards,

Mladen Marović


Re: Flatfile for Janusgraph Backend

Oleksandr Porunov
 

Hi,

JanusGraph supports Oracle Berkeley DB which is embedded into Java and works either with a filesystem (by default) or in-memory (if configured so).
That's might be something what you are looking for. For BDB to work all you need to do is set 2 required parameters:
storage.directory = /path/to/directory/where/you/want/to/store/data/
storage.backend = berkeleyje

Thus, all your data will be stored in files on your local filesystem (or distributed if used with something like nfs, glusterfs, etc.).
That said, berkeleyje has some limitations. I would recommend testing your usecase strongly before considering using it in production as it's quite easy to corrupt your files with BDB.
You may also want to use configuration `storage.berkeleyje.lock-mode: LockMode.READ_UNCOMMITTED` as I remember I had some transaction problems without it but don't remember those problems anymore.

Hope it helps.

Best regards,
Oleksandr Porunov


Re: Performance Improvement

Vinayak Bali
 

Hi Oleksandr, 

Thank you for the detailed explanation, regarding the configuration and indexes. I will dig deeper into it and try to resolve the problem. 
But I think the queries which I am executing are not efficient.
Request you to share the gremlin queries for the above two cases mentioned, in the previous mail. That will help a lot to validate the queries.

Thanks & Regards,
Vinayak


On Mon, Oct 4, 2021 at 1:27 AM Oleksandr Porunov <alexandr.porunov@...> wrote:
Hi Vinayak,

I didn't follow your statements about count but I just want to add that if you don't use mixed index for count query than your count will require iteratively returning each element and count them in-memory (i.e. very inefficient). To check if your count query is using mixed index you can use `profile()` step.

I also noticed that you say that you need to return all properties for all vertices / edges. If so, you may consider using multiQuery which will return properties for your vertices faster than valueMap() step in certain cases. The only thing you need to consider when using `multiQuery` (actually any query) is tx-cache size (don't confuse with database cache). In case your tx-cache size is too small to hold all the vertices than some vertices' properties will be evicted from the cache. Thus, when you will try to return values for the vertex properties it will make new database calls to retrieve those properties. In the worst case all your access to properties may lead to separate database calls. To eliminate this downside you need to make sure that your transaction cache size is at least the same amount of vertices your are accessing (or bigger). In such case `multiQuery().addAllVertices(yourVertices).properties()` will return all properties for all vertices and it will hold those properties in-memory instead of evicting them.

Moreover, it looks like your use cases are read-heavy and not write-heavy. You may improve your performance by making sure all your writes are using consistency-level=ALL and all your reads are using consistency-level=ONE. You may want to disable consistency check as well as internal / external checks for your transactions if you are sure about your data. It will make some of your queries faster but less safer.

You also need to make sure that you configured your CQL driver throughput optimally for your load. In case your JanusGraph is embedded into your Application you need to make sure your application has the smallest latency between your Cassandra nodes (you may even consider placing your application to the same nodes with Cassandra or just moving your Gremlin Server to those nodes and use remote connection).

There are many JanusGraph and CQL driver configurations which you may use to tune your performance for your use-case. This topic is to broad to give all-fits solution. Different use cases might need different approaches. I would strongly recommend you to explore all JanusGraph configurations here: https://docs.janusgraph.org/configs/configuration-reference/ . It will allow to configure your general JanusGraph configuration, your transactions configuration, and your CQL driver configuration much better if you are aware about all the configurations. For advanced CQL configuration options see this configurations here: https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/configuration/reference/  (storage.cql.internal in JanusGraph).

You may also try exploring other storage backends which may give you smaller latency (hence better performance), like ScyllaDB, Aerospike, etc.

Best regards,
Oleksandr Porunov


Re: Performance Improvement

Oleksandr Porunov
 

Hi Vinayak,

I didn't follow your statements about count but I just want to add that if you don't use mixed index for count query than your count will require iteratively returning each element and count them in-memory (i.e. very inefficient). To check if your count query is using mixed index you can use `profile()` step.

I also noticed that you say that you need to return all properties for all vertices / edges. If so, you may consider using multiQuery which will return properties for your vertices faster than valueMap() step in certain cases. The only thing you need to consider when using `multiQuery` (actually any query) is tx-cache size (don't confuse with database cache). In case your tx-cache size is too small to hold all the vertices than some vertices' properties will be evicted from the cache. Thus, when you will try to return values for the vertex properties it will make new database calls to retrieve those properties. In the worst case all your access to properties may lead to separate database calls. To eliminate this downside you need to make sure that your transaction cache size is at least the same amount of vertices your are accessing (or bigger). In such case `multiQuery().addAllVertices(yourVertices).properties()` will return all properties for all vertices and it will hold those properties in-memory instead of evicting them.

Moreover, it looks like your use cases are read-heavy and not write-heavy. You may improve your performance by making sure all your writes are using consistency-level=ALL and all your reads are using consistency-level=ONE. You may want to disable consistency check as well as internal / external checks for your transactions if you are sure about your data. It will make some of your queries faster but less safer.

You also need to make sure that you configured your CQL driver throughput optimally for your load. In case your JanusGraph is embedded into your Application you need to make sure your application has the smallest latency between your Cassandra nodes (you may even consider placing your application to the same nodes with Cassandra or just moving your Gremlin Server to those nodes and use remote connection).

There are many JanusGraph and CQL driver configurations which you may use to tune your performance for your use-case. This topic is to broad to give all-fits solution. Different use cases might need different approaches. I would strongly recommend you to explore all JanusGraph configurations here: https://docs.janusgraph.org/configs/configuration-reference/ . It will allow to configure your general JanusGraph configuration, your transactions configuration, and your CQL driver configuration much better if you are aware about all the configurations. For advanced CQL configuration options see this configurations here: https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/configuration/reference/  (storage.cql.internal in JanusGraph).

You may also try exploring other storage backends which may give you smaller latency (hence better performance), like ScyllaDB, Aerospike, etc.

Best regards,
Oleksandr Porunov


Re: Template configuration parameters with comma-separated lists in Janusgraph 0.6.0

hadoopmarc@...
 

Hi Mladen,

You are right, your issue is different than the one I mentioned about GraphManager.

I mentioned earlier that the JanusGraph test suite covers your use case:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-cql/src/test/java/org/janusgraph/core/cql/CQLConfiguredGraphFactoryTest.java

You can verify in the CI logs that this test passes successfully, see e.g. for a recent commit on master (the line starting with "Run mvn verify"):
https://github.com/JanusGraph/janusgraph/runs/3775465848?check_suite_focus=true

I do acknowledge that the return value you mention for
ConfiguredGraphFactory.getConfiguration('default_').get('storage.hostname')
==>[test-master, test-worker1]
look suspicious, but in what way does your use case differ from the tests with "MultiHost" CQL?

Best wishes,    Marc


Re: Performance Improvement

Vinayak Bali
 

Hi All, 

Updated the Janusgraph version to 0.6.0 and added the parallel execution queries in the configuration files as suggested by Oleksandr. Still, the performance is not improved. I think I am missing out on something and hence describing my requirement in detail. 

The attachment along with the mail contains the data model which I am using to query the schema.

This data model will be visible to the user on the UI. He can choose any number of nodes and relationships from the UI. Based on the selection, I am creating the queries to retrieve the data and count respectively. If the count exceeds a certain limit, the additional filter must be added by the user. Hence the count query is an important aspect of the implementation and performance matters. Let's consider some cases of the count queries: 

Case 1: User Selection: Node1, Node3 and Relation3
Count Query Output Required: Node1: 25, Node3: 30, Relation3: 50 i.e only the nodes that contain the relationship
Data Query: Must return all the data for Node1, Node3, and Relation3 including the properties

Case2: User Selection: Node1, Node3 and Relation3 and Node4,Node2 and Relation4
Count Query Output: Node1: 25, Node3: 30, Relation3: 50, Node4:10, Node2: 34, Relation4: 45, only the nodes that contain the relationship
Data Query: Must return all the data for Node1, Node3, Relation3, Node4,Node2 and Relation4 including the properties

Case3: Filters can be added to the above cases based on properties.

I have tried using union along with aggregate steps, but performance as required. The hardware configuration of the machine is not an issue.

Request you all to take a look and provide your valuable suggestions based on experience, to solve the problem. 
If possible share both the count and data query for all cases and configuration required to improve performance.

Thanks & Regards,
Vinayak


On Mon, Sep 13, 2021 at 2:22 AM Oleksandr Porunov <alexandr.porunov@...> wrote:
Hi Vinayak,

0.6.0 version of JanusGraph is released. I posted some quick tips to improve throughput to your CQL storage here:
https://lists.lfaidata.foundation/g/janusgraph-users/message/6148
I also had a post in LinkedIn with links to relative documentation parts and several better suggestions about internal ExecutorServices usage here: https://www.linkedin.com/posts/porunov_release-060-janusgraphjanusgraph-activity-6840714301062307840-r6Uw

In 0.6.0 you can improve your CQL throughput drastically using a simple configuration `storage.cql.executor-service.enabled: false` which I definitely recommend to do but you should properly configure throughput related configurations.

Best regards,
Oleksandr


Re: Potential transaction issue (JG 0.6.0)

sergeymetallic@...
 

Looking at the changes in 0.6.0 I think this problem was just hidden in the previous version as resources were not released properly    

private void releaseTransaction() {
        -//TODO: release non crucial data structures to preserve memory?
        isOpen = false;
        graph.closeTransaction(this);
        -vertexCache.close();
        +vertexCache = null;
        +indexCache = null;
        +addedRelations = null;
        +deletedRelations = null;
        +uniqueLocks = null;
        +newVertexIndexEntries = null;
        +newTypeCache = null;
    }


Potential transaction issue (JG 0.6.0)

sergeymetallic@...
 
Edited

The issue can be reproduced within certain conditions, I cannot find a recipe to reproduce it on any environment
We have a query of type 
g.inject((int) 1).union(...).limit(5L)

We have several subqueries in "union" that return a large amount of data. While executing this query we get an error 
"java.lang.NullPointerException: null
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getInternalVertex(StandardJanusGraphTx.java:508)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.lambda$new$6(StandardJanusGraphTx.java:1478)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:199)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:75)
at org.janusgraph.graphdb.util.SubqueryIterator.computeNext(SubqueryIterator.java:37)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:71)
at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:70)
at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:29)
at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:50)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:41)
at org.janusgraph.graphdb.util.ProfiledIterator.computeNext(ProfiledIterator.java:27)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiIterator.computeNext(MultiIterator.java:42)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:48)
at org.janusgraph.graphdb.util.MultiDistinctUnorderedIterator.computeNext(MultiDistinctUnorderedIterator.java:26)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep$EndStep.processNextStart(ComputerAwareStep.java:82)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:44)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.step.filter.DedupGlobalStep.processNextStart(DedupGlobalStep.java:107)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:222)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:410)
at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor.java:222)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I was able to fix the problem by checking if transaction is open
public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}
It was working fine in Janugraph 0.5.3

Important part is that I use "limit" here, without it everything works just fine. Maybe transaction is closing earlier than needed? Also I use remote graph via websocket

Any ideas what there might be wrong?


Re: Flatfile for Janusgraph Backend

hadoopmarc@...
 

No, JanusGraph does not have a storage backend for a single-node cluster that persists to single file (such as sqlite would). It is possible though to have a single Cassandra instance co-hosted on your JanusGraph machine. In fact, Cassandra is included in the janusgraph-full distribution to enable this out of the box.

Best wishes,   Marc


Flatfile for Janusgraph Backend

Vivek Singh Raghuwanshi
 

Hi Team,
I am working on the issue and want to i am using HBase and Solr for our current setup of Janusgraph.
Is that possible to replace HBase with a flat-file, instead of an in-memory backend?
 
Thanks