Date   

Re: Count Query

hadoopmarc@...
 

Hi Vinayak,

Can you please provide a script to generate a sample graph so that we can work on the same graph? The current question is a follow-up on an earlier discussion on this list in May, see https://lists.lfaidata.foundation/g/janusgraph-users/topic/82659112 .

Best wishes,     Marc


Count Query

Vinayak Bali
 

Hi All, 

Please take a look at following issue and provide your feedbacks

Thanks & Regards,
Vinayak


Re: Issues while iterating over self-loop edges in Apache Spark

Mladen Marović
 

Thanks for the reply. I created an issue at: https://github.com/JanusGraph/janusgraph/issues/2669

Kind regards,

Mladen Marović


Dynamic control of graph configuration

fredrick.eisele@...
 

Which, if any, of the graph configuration properties are dynamic?

In particular, I would like to toggle `storage.batch-loading`.


Re: Transaction Recovery and Bulk Loading

Boxuan Li
 

Hi Marc,

I am not familiar with batch-loading, but from what I understand, it might be because of the performance. Batch-loading aims to persist data as fast as possible, in sacrifice of functionalities like consistency checks. Write-ahead log for sure slows down the bulk loading process.

Also, technically, when batch-loading is enabled, there is a chance that your data gets persisted to your data storage in the StandardJanusGraph::prepareCommit method, which is earlier than WAL is written. When batch-loading is disabled, your data always gets persisted only after WAL is written. Not sure if there is any particular reason here but I guess this is by design.

There might or might not be other reasons behind the design choice, but performance is what comes to my mind when I see your question.

Cheers,
Boxuan

「madams via lists.lfaidata.foundation <madams=open-systems.com@...>」在 2021年6月9日 週三,下午10:53 寫道:

Hi all,

We've been integrating our pipelines with Janusgraph for sometime now, it's been working great, thanks to the developers!
We use the transaction recovery job and enabled batch-loading for performance, and then we realized the write ahead transaction log is not used when batch-loading is enabled.
By curiosity, is there any reason for this?
At the moment we disabled batch loading and consistency checks. We've thought about replacing the transaction recovery with a reindexing job but reindexing is quite a heavy operation.

Thanks,
Best Regards,
Marc


Re: Issues while iterating over self-loop edges in Apache Spark

hadoopmarc@...
 

Hi Mladen,

Indeed, the self-loop logic you point to still exists in:
https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-hadoop/src/main/java/org/janusgraph/hadoop/formats/util/JanusGraphVertexDeserializer.java

I guess the intent of the filtering of these self loop edges is to prevent that a single self-loop edge appears twice, as IN edge and as OUT edge. I also guess that the actual implementation is buggy: it is not the responsibility of the InputFormat to filter any data (your example!) but rather to represent all data present faithfully. Can you report an issue for this at https://github.com/JanusGraph/janusgraph/issues ?

This also means that there is not an easy way out, other than starting with a private build with a fix (and possibly contributing the fix as a PR).

Best wishes,    Marc


Re: Janus Graph Performance with Cassandra vs BigTable

hadoopmarc@...
 

Hi Vishal,

Your question is very general. What is most important to you: write performance, simple queries, complex queries? Do you mean comparison between managed Cassandra and managed Bigtable in terms of Euros needed for a specific workload? I am not aware of independent benchmark results for the JanusGraph storage backends, while vendors can be skimmy about circumstances for the benchmarks they present.

Some general notions:
  • Cassandra has the non-java drop in ScyllaDb, therefore large JanusGraph deployments often use ScyllaDb, see the materials on https://janusgraph.org/
  • JanusGraph has not a lot of code that is specific to any storage backend, but the adapters were designed with Cassandra in mind
  • Compatibility between JanusGraph and BigTable is only maintained indirectly through JanusGraph-HBase compatibility (I am not aware that this resulted in problems in the past, though)
Best wishes,     Marc


Re: a problem about elasticsearch

Peter Corless
 

Is it ES [the software] that is bottlenecking, or could it be the HW you have it running on? If the HW isn't the issue, have you been able to trace where the issue is in ES?

I hate to be "that guy," but if the underlying storage engine isn't keeping up, you have options with JanusGraph. Of course, if you resolve the issue and can keep running on ES, I am all for least-disruptive solutions.

But if not, I'd be remiss to not put in a plug for Scylla as a better performing option as a JanusGraph data store.

Hope you get it resolved!

On Fri, Jun 11, 2021, 1:37 AM <anjanisingh22@...> wrote:
Hi Anshul,

I am facing same issue? Did you got any solution for the issue?

Thanks,
Anjani


Re: a problem about elasticsearch

anjanisingh22@...
 

Hi Anshul,

I am facing same issue? Did you got any solution for the issue?

Thanks,
Anjani


Janus Graph Performance with Cassandra vs BigTable

Vishal Gupta <vgupta@...>
 

Hi Community/Team, 

I see that Janus graph can be integrated with multiple storage backends like Cassandra and BigTable. 
I am trying to evaluating which storage backend is more performant for Janus Graph. 

I want to see if people have any recommendations here ? Has anyone done performance comparison evaluating performance of Janus + BitTable vs Janus + Cassandra ?

Thanks
Vishal


Transaction Recovery and Bulk Loading

madams@...
 

Hi all,

We've been integrating our pipelines with Janusgraph for sometime now, it's been working great, thanks to the developers!
We use the transaction recovery job and enabled batch-loading for performance, and then we realized the write ahead transaction log is not used when batch-loading is enabled.
By curiosity, is there any reason for this?
At the moment we disabled batch loading and consistency checks. We've thought about replacing the transaction recovery with a reindexing job but reindexing is quite a heavy operation.

Thanks,
Best Regards,
Marc


Issues while iterating over self-loop edges in Apache Spark

Mladen Marović
 

Hello,

while debugging some Apache Spark jobs that process data from a Janusgraph graph. i noticed some issues with self-loop edges (edges that connect a vertex to itself). The data is read using:

javaSparkContext.newAPIHadoopRDD(hadoopConfiguration(), CqlInputFormat.class, NullWritable.class, VertexWritable.class)

When I try to process all outbound edges of a single vertex using:

vertex.edges(Direction.OUT)

and that vertex has multiple self-loop edges with the same edge label, the iterator always returns only one such edge. Edges that are not self-loop are all returned as expected.

To give a specific example, if I have a vertex V0 with edges that E1, E2, E3, E4, E5 that lead to vertices V1, V2, V3, V4, V5, the call vertex.edges(Direction.OUT) will return an iterator that iterates over all five edges. However, if I have a vertex V0 with edges E1, E2, E3 that lead to V1, V2, V3, and self-loop edges EL1, EL2, EL3, the iterator will iterate over E1, E2, E3, and only one of (EL1, EL2, EL3), giving a total of four edges instead of the expected six.

After further analysis, I came upon this commit:

https://github.com/JanusGraph/janusgraph/commit/d3006dc939c1b640bb263806abd3fd6bee630d12

which explicitly added code that skips deserializing multiple self-loop edges. The code from the linked commit is still present in org.janusgraph:janusgraph-hadoop:0.5.3 and seems to be the cause of this unexpected behavior.

My questions are as follows:

  1. What is the reason behind implementing the change from the given commit?
  2. Is there another way to iterate on all edges, including (possibly) multiple self-loop edges with the same edge label?

Kind regards,

Mladen Marović


Re: Difference Between JanusGraph Server and Embedded JanusGraph in Java

hadoopmarc@...
 

Hi Zach,

1. For building an API service you do not need Gremlin Server. Gremlin Server has all kinds of features though that might (slightly) relieve the complexity of your service (with the complexity of maintaining  Gremlin Server added). The main driver for using Gremlin Server is the support for Gremlin Language Variants, which you do not need.
Resource usage should not differ very much for similar workloads and comparable settings; Gremlin Server requires an additional JVM, but might be more optimized than what you build in house.

2. First check using Gremlin Console for connecting to Gremlin Server. If that works, please report more details about what visualization tool you use.

Best wishes,     Marc


Difference Between JanusGraph Server and Embedded JanusGraph in Java

Zach B.
 

I've seen a lot of discussion about the benefits and such of both implementations but I was wondering if there was a big difference in terms of resource usage? I'm building an API service that will be deployed to a low resource virtual machine and I was wondering if there was a big difference between the memory usage of the two implementations.

Furthermore and unrelated, but I have been developing using the Embedded implementation using HBase as a storage backend. I wanted to use a visualization tool to see if my graph is appearing the way I want, however all the tools I see require gremlin-server. So I started up the server using the same exact HBase configuration as Embedded, but it displays an empty graph. Does anyone know why that is the case?

Thank you in advance.


Re: Getting org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException consistently

hadoopmarc@...
 

Hi,

There does not seem to be much that helps in finding a root cause (no similar questions or issues in history). The most helpful thing I found is the following javadoc:
https://javadoc.io/doc/org.janusgraph/janusgraph-core/latest/org/janusgraph/graphdb/database/idassigner/placement/SimpleBulkPlacementStrategy.html

Assuming that you use this default SimpleBulkPlacementStrategy, what value do your use for ids.num-partitions ?  The default number might be too small. In the beginning of a spark job, the tasks can be more or less synchronized, that is they finish after about the same amount of time and then cause congestion (task number 349 ...). If this is the case, other configs could help too:

ids.renew-percentage                             If you increase this value, congestion is avoided a bit, but this cannot have a high impact.
ids.flush                                                  I assume you did not change the default "true" value
ids.authority.conflict-avoidance-mode    Undocumented, but talks about contention during ID block reservation

Best wishes,    Marc


Getting org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException consistently

sauverma
 

Hi

I am getting the below exception while ingesting data to an existing graph

Job aborted due to stage failure: Task 349 in stage 2.0 failed 10 times, most recent failure: Lost task 349.9 in stage 2.0 (TID 2524, dproc-connect-graph1-prod-us-sw-xwv9.c.zeotap-prod-datalake.internal, executor 262): org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException: Could not find non-exhausted partition ID Pool after 1000 attempts

The value of `ids.block-size` is set to 5000000 (50M) and I am using spark for data loading (around 300 executors per run).

Could you please suggest the configuration which can fix this issue?

Thanks


Re: Backend data model deserialization

Elliot Block <eblock@...>
 

Awesome thank you all for the great info and recent presentations!  We are prototyping bulk export + deserialize from Cloud Bigtable over approx. the next week and will try to report back if we can produce something useful to share.  Thanks again, -Elliot
 
On Thu, May 20, 2021 at 6:45 AM sauverma <saurabhdec1988@...> wrote:
At zeotap we ve taken the same route to enable olap consumers via apache spark. We presented it in the recent janusgraph meet-up at https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376. We are using ScyllaDB as the backend.
  
On Thu, May 20, 2021, 6:12 PM Boxuan Li <liboxuan@...> wrote:
If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:
 
On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:
If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project. 
_._,_.


Re: ID block allocation exception while creating edge

hadoopmarc@...
 

Hi Anjani,

One thing that does not feel good is that you create and commit a transaction for every row of your dataframe. Although I do not see how this would interfere with ID allocation, best practice is to have partitions of about 10.000 vertices/edges and commit these as one batch. In case of an exception, you rollback the transaction and raise your own exception. After that, Spark will retry the partition and your job will still succeed. It is worth a atry.

Best wishes,    Marc


Re: Making janus graph client to not use QUORUM

anjanisingh22@...
 

Thanks Marc, i will try that option.


Re: ID block allocation exception while creating edge

anjanisingh22@...
 

Sharing detail on how i am creating node/edges to make sure nothing wrong with that which is resulting in ID allocation failures.

 

I am creating one static instance JanusGraph object on each spark worker box and using that i am creating multiple transaction and commit.

pairRDD.foreachPartition(partIterator -> {
partIterator.forEachRemaining( tuple -> {
createNodeAndEdge(tuple, JanusGraphConfig.getJanusGraph(janusConfig));
});
}); where JanusGraphConfig.getJanusGraph returns static instance.

 

In createNodeAndEdge() method i am creating GraphTraversalSource using static janusGraph, creating node, edge, committing and then closing GraphTraversalSource object, as shown below in pseudo code:

createNodeAndEdge(Tuple2<K, V> pair, JanusGraph janusGraph)

{

GraphTraversalSource g = janusGraph.buildTransaction().start().traversal();
 try{

      create node;

      create edge;
     
      g.tx().commit();

    }  catch ( Exception) {

     g.tx().rollback();
  } finally() {
    g.tx().close();

    g.close();
  }

}

 

Thanks,
Anjani

641 - 660 of 6588