Date   

Re: Failed to connect to a Cassandra cluster's all nodes without default port

Boxuan Li
 

Hi,

Can you check your cassandra nodes’ configurations? It looks like some of your nodes are still using 9042 for native transport port.

Cheers,
Boxuan

「<zhouyu74748585@...>」在 2021年6月17日 週四,下午4:39 寫道:

Hello everyone, when I try to connect to a Cassandra cluster which's port is not default, just like this 

StandardJanusGraph graph = (StandardJanusGraph) JanusGraphFactory.build().set("storage.backend", "cql").
set("storage.hostname", "192.168.223.3:51705,192.168.223.4:51705,192.168.223.5:51705,").
open();
but when I start it,
only one node's is correct, the others' port change to 9042, the default one. so I can only connect to one node.
I try to fix this, but it only useful when all nodes' port are the same.if different node has different port.it will not work.
So I hope someone will have a good solution.


Re: Custom analyzer with traversal queries

Boxuan Li
 

Hi Flo,

Elasticsearch Index is only used when your query is a global query (a.k.a. graph-centric query in JanusGraph), i.e. look up vertices with conditions. `traversal.V().has(…)` is a global query.

On the other hand, `traversal().V(idx1, idx2, …).has(…)` is a vertex-centric query rather than global query. Elasticsearch is not involved, thus your custom analyzers are not used.

Your observations are valid. The documentation is not very clear indeed.

Unfortunately I don’t see any workaround here. Probably you could modify Text.java and build your own JanusGraph. You may also want to create a feature request to allow custom analyzers without indexing backends. Probably we could make Text.java pluggable (not sure if that’s possible).

Best regards,
Boxuan

On Jun 17, 2021, at 3:56 PM, florent@... wrote:

Hello everyone,


As it's my first post, I'd first thank you all for your work ! It has been pretty neat to discover all of the work !

We've started to investigate the use of custom analyzers in our schema (-> https://docs.janusgraph.org/v0.5/index-backend/field-mapping/#for-elasticsearch), where ES is used as an indexing backend.

From what I've seen, initial selections will indeed use the index (e.g. `traversal().V().has(...)`). However, it isn't always used, and sometimes the janusgraph implementation of the predicate is used (e.g. when `traversal().V(idx1, idx2, ...).has(...)`). In this latter case, it's the predicates as implemented here that are run. For text fields, it looks like a tokenization and lower-casing are perform by them directly.

I haven't seen mention of this case in the documentation. So, I'm wondering if my observations are valid.

If so, is there a way to force the usage of ES indices? An IDs query could be way to implement such feature (at least for ES).


All the best,
Flo


Re: EOL version components

Boxuan Li
 

Hi Kumkar,

There is no standard process. I would suggest creating a [Feature Request] type issue: https://github.com/JanusGraph/janusgraph/issues/new?assignees=&labels=&template=feature-request.md 

You may want to state which libraries are in EOL, and preferably link to their EOL statements/documentation. I recommend checking the dependency version on master branch first before reporting.

Best regards,
Boxuan

On Jun 17, 2021, at 9:57 PM, kumkar.dev@... wrote:

Thanks Boxuan this really helps !

Can you please help with pointers to create GitHub issue for any EOL version?

Thanks
Dev Kumkar


Re: EOL version components

kumkar.dev@...
 

Thanks Boxuan this really helps !

Can you please help with pointers to create GitHub issue for any EOL version?

Thanks
Dev Kumkar


Re: EOL version components

Boxuan Li
 

Hi Kumkar,

The incoming 0.6.0 release contains (but not limited to) the following updates:

hppc: 0.8.0
noggit: 0.8
commons-configuration2: 2.7
commons-lang3: 3.11

You are welcome to create a GitHub issue for any EOL version you observe that is still being used by JanusGraph.

Best regards,
Boxuan

On Jun 17, 2021, at 9:35 PM, kumkar.dev@... wrote:

Hello

We are using Janus 0.5.3 and there are some components which are either old or else EOL version being used.

Old version used:
  • hppc-0.7.1.jar
  • jcabi-log-0.14.jar
  • noggit-0.6.jar
EOL version used:
  • commons-collections-3.2.2.jar
  • commons-configuration-1.10.jar
  • commons-lang-2.6.jar
  • javapoet-1.8.0.jar
Are there any upcoming updates which could be shared here pointing when and which of these components would be updated?

Thanks
Dev Kumkar


EOL version components

kumkar.dev@...
 

Hello

We are using Janus 0.5.3 and there are some components which are either old or else EOL version being used.

Old version used:
  • hppc-0.7.1.jar
  • jcabi-log-0.14.jar
  • noggit-0.6.jar
EOL version used:
  • commons-collections-3.2.2.jar
  • commons-configuration-1.10.jar
  • commons-lang-2.6.jar
  • javapoet-1.8.0.jar
Are there any upcoming updates which could be shared here pointing when and which of these components would be updated?

Thanks
Dev Kumkar


Failed to connect to a Cassandra cluster's all nodes without default port

zhouyu74748585@...
 

Hello everyone, when I try to connect to a Cassandra cluster which's port is not default, just like this 

StandardJanusGraph graph = (StandardJanusGraph) JanusGraphFactory.build().set("storage.backend", "cql").
set("storage.hostname", "192.168.223.3:51705,192.168.223.4:51705,192.168.223.5:51705,").
open();
but when I start it,
only one node's is correct, the others' port change to 9042, the default one. so I can only connect to one node.
I try to fix this, but it only useful when all nodes' port are the same.if different node has different port.it will not work.
So I hope someone will have a good solution.


Custom analyzer with traversal queries

florent@...
 

Hello everyone,


As it's my first post, I'd first thank you all for your work ! It has been pretty neat to discover all of the work !

We've started to investigate the use of custom analyzers in our schema (-> https://docs.janusgraph.org/v0.5/index-backend/field-mapping/#for-elasticsearch), where ES is used as an indexing backend.

From what I've seen, initial selections will indeed use the index (e.g. `traversal().V().has(...)`). However, it isn't always used, and sometimes the janusgraph implementation of the predicate is used (e.g. when `traversal().V(idx1, idx2, ...).has(...)`). In this latter case, it's the predicates as implemented here that are run. For text fields, it looks like a tokenization and lower-casing are perform by them directly.

I haven't seen mention of this case in the documentation. So, I'm wondering if my observations are valid.

If so, is there a way to force the usage of ES indices? An IDs query could be way to implement such feature (at least for ES).


All the best,
Flo


Re: Count Query

hadoopmarc@...
 

Hi Vinayak,

Can you please provide a script to generate a sample graph so that we can work on the same graph? The current question is a follow-up on an earlier discussion on this list in May, see https://lists.lfaidata.foundation/g/janusgraph-users/topic/82659112 .

Best wishes,     Marc


Count Query

Vinayak Bali
 

Hi All, 

Please take a look at following issue and provide your feedbacks

Thanks & Regards,
Vinayak


Re: Issues while iterating over self-loop edges in Apache Spark

Mladen Marović
 

Thanks for the reply. I created an issue at: https://github.com/JanusGraph/janusgraph/issues/2669

Kind regards,

Mladen Marović


Dynamic control of graph configuration

fredrick.eisele@...
 

Which, if any, of the graph configuration properties are dynamic?

In particular, I would like to toggle `storage.batch-loading`.


Re: Transaction Recovery and Bulk Loading

Boxuan Li
 

Hi Marc,

I am not familiar with batch-loading, but from what I understand, it might be because of the performance. Batch-loading aims to persist data as fast as possible, in sacrifice of functionalities like consistency checks. Write-ahead log for sure slows down the bulk loading process.

Also, technically, when batch-loading is enabled, there is a chance that your data gets persisted to your data storage in the StandardJanusGraph::prepareCommit method, which is earlier than WAL is written. When batch-loading is disabled, your data always gets persisted only after WAL is written. Not sure if there is any particular reason here but I guess this is by design.

There might or might not be other reasons behind the design choice, but performance is what comes to my mind when I see your question.

Cheers,
Boxuan

「madams via lists.lfaidata.foundation <madams=open-systems.com@...>」在 2021年6月9日 週三,下午10:53 寫道:

Hi all,

We've been integrating our pipelines with Janusgraph for sometime now, it's been working great, thanks to the developers!
We use the transaction recovery job and enabled batch-loading for performance, and then we realized the write ahead transaction log is not used when batch-loading is enabled.
By curiosity, is there any reason for this?
At the moment we disabled batch loading and consistency checks. We've thought about replacing the transaction recovery with a reindexing job but reindexing is quite a heavy operation.

Thanks,
Best Regards,
Marc


Re: Issues while iterating over self-loop edges in Apache Spark

hadoopmarc@...
 

Hi Mladen,

Indeed, the self-loop logic you point to still exists in:
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-hadoop/src/main/java/org/janusgraph/hadoop/formats/util/JanusGraphVertexDeserializer.java

I guess the intent of the filtering of these self loop edges is to prevent that a single self-loop edge appears twice, as IN edge and as OUT edge. I also guess that the actual implementation is buggy: it is not the responsibility of the InputFormat to filter any data (your example!) but rather to represent all data present faithfully. Can you report an issue for this at https://github.com/JanusGraph/janusgraph/issues ?

This also means that there is not an easy way out, other than starting with a private build with a fix (and possibly contributing the fix as a PR).

Best wishes,    Marc


Re: Janus Graph Performance with Cassandra vs BigTable

hadoopmarc@...
 

Hi Vishal,

Your question is very general. What is most important to you: write performance, simple queries, complex queries? Do you mean comparison between managed Cassandra and managed Bigtable in terms of Euros needed for a specific workload? I am not aware of independent benchmark results for the JanusGraph storage backends, while vendors can be skimmy about circumstances for the benchmarks they present.

Some general notions:
  • Cassandra has the non-java drop in ScyllaDb, therefore large JanusGraph deployments often use ScyllaDb, see the materials on https://janusgraph.org/
  • JanusGraph has not a lot of code that is specific to any storage backend, but the adapters were designed with Cassandra in mind
  • Compatibility between JanusGraph and BigTable is only maintained indirectly through JanusGraph-HBase compatibility (I am not aware that this resulted in problems in the past, though)
Best wishes,     Marc


Re: a problem about elasticsearch

Peter Corless
 

Is it ES [the software] that is bottlenecking, or could it be the HW you have it running on? If the HW isn't the issue, have you been able to trace where the issue is in ES?

I hate to be "that guy," but if the underlying storage engine isn't keeping up, you have options with JanusGraph. Of course, if you resolve the issue and can keep running on ES, I am all for least-disruptive solutions.

But if not, I'd be remiss to not put in a plug for Scylla as a better performing option as a JanusGraph data store.

Hope you get it resolved!

On Fri, Jun 11, 2021, 1:37 AM <anjanisingh22@...> wrote:
Hi Anshul,

I am facing same issue? Did you got any solution for the issue?

Thanks,
Anjani


Re: a problem about elasticsearch

anjanisingh22@...
 

Hi Anshul,

I am facing same issue? Did you got any solution for the issue?

Thanks,
Anjani


Janus Graph Performance with Cassandra vs BigTable

Vishal Gupta <vgupta@...>
 

Hi Community/Team, 

I see that Janus graph can be integrated with multiple storage backends like Cassandra and BigTable. 
I am trying to evaluating which storage backend is more performant for Janus Graph. 

I want to see if people have any recommendations here ? Has anyone done performance comparison evaluating performance of Janus + BitTable vs Janus + Cassandra ?

Thanks
Vishal


Transaction Recovery and Bulk Loading

madams@...
 

Hi all,

We've been integrating our pipelines with Janusgraph for sometime now, it's been working great, thanks to the developers!
We use the transaction recovery job and enabled batch-loading for performance, and then we realized the write ahead transaction log is not used when batch-loading is enabled.
By curiosity, is there any reason for this?
At the moment we disabled batch loading and consistency checks. We've thought about replacing the transaction recovery with a reindexing job but reindexing is quite a heavy operation.

Thanks,
Best Regards,
Marc


Issues while iterating over self-loop edges in Apache Spark

Mladen Marović
 

Hello,

while debugging some Apache Spark jobs that process data from a Janusgraph graph. i noticed some issues with self-loop edges (edges that connect a vertex to itself). The data is read using:

javaSparkContext.newAPIHadoopRDD(hadoopConfiguration(), CqlInputFormat.class, NullWritable.class, VertexWritable.class)

When I try to process all outbound edges of a single vertex using:

vertex.edges(Direction.OUT)

and that vertex has multiple self-loop edges with the same edge label, the iterator always returns only one such edge. Edges that are not self-loop are all returned as expected.

To give a specific example, if I have a vertex V0 with edges that E1, E2, E3, E4, E5 that lead to vertices V1, V2, V3, V4, V5, the call vertex.edges(Direction.OUT) will return an iterator that iterates over all five edges. However, if I have a vertex V0 with edges E1, E2, E3 that lead to V1, V2, V3, and self-loop edges EL1, EL2, EL3, the iterator will iterate over E1, E2, E3, and only one of (EL1, EL2, EL3), giving a total of four edges instead of the expected six.

After further analysis, I came upon this commit:

https://github.com/JanusGraph/janusgraph/commit/d3006dc939c1b640bb263806abd3fd6bee630d12

which explicitly added code that skips deserializing multiple self-loop edges. The code from the linked commit is still present in org.janusgraph:janusgraph-hadoop:0.5.3 and seems to be the cause of this unexpected behavior.

My questions are as follows:

  1. What is the reason behind implementing the change from the given commit?
  2. Is there another way to iterate on all edges, including (possibly) multiple self-loop edges with the same edge label?

Kind regards,

Mladen Marović

701 - 720 of 6656