Date   

EOL version components

kumkar.dev@...
 

Hello

We are using Janus 0.5.3 and there are some components which are either old or else EOL version being used.

Old version used:
  • hppc-0.7.1.jar
  • jcabi-log-0.14.jar
  • noggit-0.6.jar
EOL version used:
  • commons-collections-3.2.2.jar
  • commons-configuration-1.10.jar
  • commons-lang-2.6.jar
  • javapoet-1.8.0.jar
Are there any upcoming updates which could be shared here pointing when and which of these components would be updated?

Thanks
Dev Kumkar


Failed to connect to a Cassandra cluster's all nodes without default port

zhouyu74748585@...
 

Hello everyone, when I try to connect to a Cassandra cluster which's port is not default, just like this 

StandardJanusGraph graph = (StandardJanusGraph) JanusGraphFactory.build().set("storage.backend", "cql").
set("storage.hostname", "192.168.223.3:51705,192.168.223.4:51705,192.168.223.5:51705,").
open();
but when I start it,
only one node's is correct, the others' port change to 9042, the default one. so I can only connect to one node.
I try to fix this, but it only useful when all nodes' port are the same.if different node has different port.it will not work.
So I hope someone will have a good solution.


Custom analyzer with traversal queries

florent@...
 

Hello everyone,


As it's my first post, I'd first thank you all for your work ! It has been pretty neat to discover all of the work !

We've started to investigate the use of custom analyzers in our schema (-> https://docs.janusgraph.org/v0.5/index-backend/field-mapping/#for-elasticsearch), where ES is used as an indexing backend.

From what I've seen, initial selections will indeed use the index (e.g. `traversal().V().has(...)`). However, it isn't always used, and sometimes the janusgraph implementation of the predicate is used (e.g. when `traversal().V(idx1, idx2, ...).has(...)`). In this latter case, it's the predicates as implemented here that are run. For text fields, it looks like a tokenization and lower-casing are perform by them directly.

I haven't seen mention of this case in the documentation. So, I'm wondering if my observations are valid.

If so, is there a way to force the usage of ES indices? An IDs query could be way to implement such feature (at least for ES).


All the best,
Flo


Re: Count Query

hadoopmarc@...
 

Hi Vinayak,

Can you please provide a script to generate a sample graph so that we can work on the same graph? The current question is a follow-up on an earlier discussion on this list in May, see https://lists.lfaidata.foundation/g/janusgraph-users/topic/82659112 .

Best wishes,     Marc


Count Query

Vinayak Bali
 

Hi All, 

Please take a look at following issue and provide your feedbacks

Thanks & Regards,
Vinayak


Re: Issues while iterating over self-loop edges in Apache Spark

Mladen Marović
 

Thanks for the reply. I created an issue at: https://github.com/JanusGraph/janusgraph/issues/2669

Kind regards,

Mladen Marović


Dynamic control of graph configuration

fredrick.eisele@...
 

Which, if any, of the graph configuration properties are dynamic?

In particular, I would like to toggle `storage.batch-loading`.


Re: Transaction Recovery and Bulk Loading

Boxuan Li
 

Hi Marc,

I am not familiar with batch-loading, but from what I understand, it might be because of the performance. Batch-loading aims to persist data as fast as possible, in sacrifice of functionalities like consistency checks. Write-ahead log for sure slows down the bulk loading process.

Also, technically, when batch-loading is enabled, there is a chance that your data gets persisted to your data storage in the StandardJanusGraph::prepareCommit method, which is earlier than WAL is written. When batch-loading is disabled, your data always gets persisted only after WAL is written. Not sure if there is any particular reason here but I guess this is by design.

There might or might not be other reasons behind the design choice, but performance is what comes to my mind when I see your question.

Cheers,
Boxuan

「madams via lists.lfaidata.foundation <madams=open-systems.com@...>」在 2021年6月9日 週三,下午10:53 寫道:

Hi all,

We've been integrating our pipelines with Janusgraph for sometime now, it's been working great, thanks to the developers!
We use the transaction recovery job and enabled batch-loading for performance, and then we realized the write ahead transaction log is not used when batch-loading is enabled.
By curiosity, is there any reason for this?
At the moment we disabled batch loading and consistency checks. We've thought about replacing the transaction recovery with a reindexing job but reindexing is quite a heavy operation.

Thanks,
Best Regards,
Marc


Re: Issues while iterating over self-loop edges in Apache Spark

hadoopmarc@...
 

Hi Mladen,

Indeed, the self-loop logic you point to still exists in:
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-hadoop/src/main/java/org/janusgraph/hadoop/formats/util/JanusGraphVertexDeserializer.java

I guess the intent of the filtering of these self loop edges is to prevent that a single self-loop edge appears twice, as IN edge and as OUT edge. I also guess that the actual implementation is buggy: it is not the responsibility of the InputFormat to filter any data (your example!) but rather to represent all data present faithfully. Can you report an issue for this at https://github.com/JanusGraph/janusgraph/issues ?

This also means that there is not an easy way out, other than starting with a private build with a fix (and possibly contributing the fix as a PR).

Best wishes,    Marc


Re: Janus Graph Performance with Cassandra vs BigTable

hadoopmarc@...
 

Hi Vishal,

Your question is very general. What is most important to you: write performance, simple queries, complex queries? Do you mean comparison between managed Cassandra and managed Bigtable in terms of Euros needed for a specific workload? I am not aware of independent benchmark results for the JanusGraph storage backends, while vendors can be skimmy about circumstances for the benchmarks they present.

Some general notions:
  • Cassandra has the non-java drop in ScyllaDb, therefore large JanusGraph deployments often use ScyllaDb, see the materials on https://janusgraph.org/
  • JanusGraph has not a lot of code that is specific to any storage backend, but the adapters were designed with Cassandra in mind
  • Compatibility between JanusGraph and BigTable is only maintained indirectly through JanusGraph-HBase compatibility (I am not aware that this resulted in problems in the past, though)
Best wishes,     Marc


Re: a problem about elasticsearch

Peter Corless
 

Is it ES [the software] that is bottlenecking, or could it be the HW you have it running on? If the HW isn't the issue, have you been able to trace where the issue is in ES?

I hate to be "that guy," but if the underlying storage engine isn't keeping up, you have options with JanusGraph. Of course, if you resolve the issue and can keep running on ES, I am all for least-disruptive solutions.

But if not, I'd be remiss to not put in a plug for Scylla as a better performing option as a JanusGraph data store.

Hope you get it resolved!

On Fri, Jun 11, 2021, 1:37 AM <anjanisingh22@...> wrote:
Hi Anshul,

I am facing same issue? Did you got any solution for the issue?

Thanks,
Anjani


Re: a problem about elasticsearch

anjanisingh22@...
 

Hi Anshul,

I am facing same issue? Did you got any solution for the issue?

Thanks,
Anjani


Janus Graph Performance with Cassandra vs BigTable

Vishal Gupta <vgupta@...>
 

Hi Community/Team, 

I see that Janus graph can be integrated with multiple storage backends like Cassandra and BigTable. 
I am trying to evaluating which storage backend is more performant for Janus Graph. 

I want to see if people have any recommendations here ? Has anyone done performance comparison evaluating performance of Janus + BitTable vs Janus + Cassandra ?

Thanks
Vishal


Transaction Recovery and Bulk Loading

madams@...
 

Hi all,

We've been integrating our pipelines with Janusgraph for sometime now, it's been working great, thanks to the developers!
We use the transaction recovery job and enabled batch-loading for performance, and then we realized the write ahead transaction log is not used when batch-loading is enabled.
By curiosity, is there any reason for this?
At the moment we disabled batch loading and consistency checks. We've thought about replacing the transaction recovery with a reindexing job but reindexing is quite a heavy operation.

Thanks,
Best Regards,
Marc


Issues while iterating over self-loop edges in Apache Spark

Mladen Marović
 

Hello,

while debugging some Apache Spark jobs that process data from a Janusgraph graph. i noticed some issues with self-loop edges (edges that connect a vertex to itself). The data is read using:

javaSparkContext.newAPIHadoopRDD(hadoopConfiguration(), CqlInputFormat.class, NullWritable.class, VertexWritable.class)

When I try to process all outbound edges of a single vertex using:

vertex.edges(Direction.OUT)

and that vertex has multiple self-loop edges with the same edge label, the iterator always returns only one such edge. Edges that are not self-loop are all returned as expected.

To give a specific example, if I have a vertex V0 with edges that E1, E2, E3, E4, E5 that lead to vertices V1, V2, V3, V4, V5, the call vertex.edges(Direction.OUT) will return an iterator that iterates over all five edges. However, if I have a vertex V0 with edges E1, E2, E3 that lead to V1, V2, V3, and self-loop edges EL1, EL2, EL3, the iterator will iterate over E1, E2, E3, and only one of (EL1, EL2, EL3), giving a total of four edges instead of the expected six.

After further analysis, I came upon this commit:

https://github.com/JanusGraph/janusgraph/commit/d3006dc939c1b640bb263806abd3fd6bee630d12

which explicitly added code that skips deserializing multiple self-loop edges. The code from the linked commit is still present in org.janusgraph:janusgraph-hadoop:0.5.3 and seems to be the cause of this unexpected behavior.

My questions are as follows:

  1. What is the reason behind implementing the change from the given commit?
  2. Is there another way to iterate on all edges, including (possibly) multiple self-loop edges with the same edge label?

Kind regards,

Mladen Marović


Re: Difference Between JanusGraph Server and Embedded JanusGraph in Java

hadoopmarc@...
 

Hi Zach,

1. For building an API service you do not need Gremlin Server. Gremlin Server has all kinds of features though that might (slightly) relieve the complexity of your service (with the complexity of maintaining  Gremlin Server added). The main driver for using Gremlin Server is the support for Gremlin Language Variants, which you do not need.
Resource usage should not differ very much for similar workloads and comparable settings; Gremlin Server requires an additional JVM, but might be more optimized than what you build in house.

2. First check using Gremlin Console for connecting to Gremlin Server. If that works, please report more details about what visualization tool you use.

Best wishes,     Marc


Difference Between JanusGraph Server and Embedded JanusGraph in Java

Zach B.
 

I've seen a lot of discussion about the benefits and such of both implementations but I was wondering if there was a big difference in terms of resource usage? I'm building an API service that will be deployed to a low resource virtual machine and I was wondering if there was a big difference between the memory usage of the two implementations.

Furthermore and unrelated, but I have been developing using the Embedded implementation using HBase as a storage backend. I wanted to use a visualization tool to see if my graph is appearing the way I want, however all the tools I see require gremlin-server. So I started up the server using the same exact HBase configuration as Embedded, but it displays an empty graph. Does anyone know why that is the case?

Thank you in advance.


Re: Getting org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException consistently

hadoopmarc@...
 

Hi,

There does not seem to be much that helps in finding a root cause (no similar questions or issues in history). The most helpful thing I found is the following javadoc:
https://javadoc.io/doc/org.janusgraph/janusgraph-core/latest/org/janusgraph/graphdb/database/idassigner/placement/SimpleBulkPlacementStrategy.html

Assuming that you use this default SimpleBulkPlacementStrategy, what value do your use for ids.num-partitions ?  The default number might be too small. In the beginning of a spark job, the tasks can be more or less synchronized, that is they finish after about the same amount of time and then cause congestion (task number 349 ...). If this is the case, other configs could help too:

ids.renew-percentage                             If you increase this value, congestion is avoided a bit, but this cannot have a high impact.
ids.flush                                                  I assume you did not change the default "true" value
ids.authority.conflict-avoidance-mode    Undocumented, but talks about contention during ID block reservation

Best wishes,    Marc


Getting org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException consistently

sauverma
 

Hi

I am getting the below exception while ingesting data to an existing graph

Job aborted due to stage failure: Task 349 in stage 2.0 failed 10 times, most recent failure: Lost task 349.9 in stage 2.0 (TID 2524, dproc-connect-graph1-prod-us-sw-xwv9.c.zeotap-prod-datalake.internal, executor 262): org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException: Could not find non-exhausted partition ID Pool after 1000 attempts

The value of `ids.block-size` is set to 5000000 (50M) and I am using spark for data loading (around 300 executors per run).

Could you please suggest the configuration which can fix this issue?

Thanks


Re: Backend data model deserialization

Elliot Block <eblock@...>
 

Awesome thank you all for the great info and recent presentations!  We are prototyping bulk export + deserialize from Cloud Bigtable over approx. the next week and will try to report back if we can produce something useful to share.  Thanks again, -Elliot
 
On Thu, May 20, 2021 at 6:45 AM sauverma <saurabhdec1988@...> wrote:
At zeotap we ve taken the same route to enable olap consumers via apache spark. We presented it in the recent janusgraph meet-up at https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376. We are using ScyllaDB as the backend.
  
On Thu, May 20, 2021, 6:12 PM Boxuan Li <liboxuan@...> wrote:
If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:
 
On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:
If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project. 
_._,_.

701 - 720 of 6651