Date   

Hbase read after write not working with janusgraph-0.6.1 but was working with janusgraph-0.6.1

Nikita Pande
 

So basically g.addV() is adding vertices in hbase server. But retrieval after restart of gremlin is not happening. 
Step1. In gremlin console, g2.addV() and g2.V().count() returns 124

Step2: Restart gremlin conosole and run g2.V().count()

gremlin> g2.V().count()

14:04:54 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

==>123

This is still the old value.

Howe ver same works with janusgraph-0.6.0


Re: Bulk Loading with Spark

Joe Obernberger
 

Thank you Marc - something isn't right with my code - debugging.  Right now the graph is 4,339,690 vertices and 15,707,179 edges, but that took days to build, and is probably 5% of the data.
Querying the graph is fast.

-Joe

On 5/22/2022 7:53 AM, hadoopmarc@... wrote:
Hi Joe,

What is slow? Can you please check the Expero blog series and compare to their reference numbers (per parallel spark task):

https://www.experoinc.com/post/janusgraph-nuts-and-bolts-part-1-write-performance

Best wishes,

Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Re: Bulk Loading with Spark

hadoopmarc@...
 

Hi Joe,

What is slow? Can you please check the Expero blog series and compare to their reference numbers (per parallel spark task):

https://www.experoinc.com/post/janusgraph-nuts-and-bolts-part-1-write-performance

Best wishes,

Marc


Re: Bulk Loading with Spark

Joe Obernberger
 

Should have added - I'm connecting with:

JanusGraph graph = JanusGraphFactory.build()
                .set("storage.backend", "cql")
                .set("storage.hostname", "charon:9042, chaos:9042")
                .set("storage.cql.keyspace", "graph")
                .set("storage.cql.cluster-name", "JoeCluster")
.set("storage.cql.only-use-local-consistency-for-system-operations", "true")
                .set("storage.cql.batch-statement-size", 256)
                .set("storage.cql.local-max-connections-per-host", 8)
                .set("storage.cql.read-consistency-level", "ONE")
                .set("storage.batch-loading", true)
                .set("schema.default", "none")
                .set("ids.block-size", 100000)
                .set("storage.buffer-size", 16384)
                .open();


-Joe

On 5/20/2022 5:28 PM, Joe Obernberger via lists.lfaidata.foundation wrote:
Hi All - I'm trying to use spark to do a bulk load, but it's very slow.
The cassandra cluster I'm connecting to is a bare-metal, 15 node cluster.

I'm using java code to do the loading using:
GraphTraversalSource.addV and Vertex.addEdge in a loop.

Is there a better way?

Thank you!

-Joe

--
This email has been checked for viruses by AVG.
https://www.avg.com


Bulk Loading with Spark

Joe Obernberger
 

Hi All - I'm trying to use spark to do a bulk load, but it's very slow.
The cassandra cluster I'm connecting to is a bare-metal, 15 node cluster.

I'm using java code to do the loading using:
GraphTraversalSource.addV and Vertex.addEdge in a loop.

Is there a better way?

Thank you!

-Joe


--
This email has been checked for viruses by AVG.
https://www.avg.com


Re: NoSuchMethodError

hadoopmarc@...
 

Thanks for reporting back! Unfortunately, java still has big problems using multiple versions of the same library, so your current approach is not a guarantee for success for future version updates (and even your current project may suffer from side-effects you have not encountered yet):

https://www.theserverside.com/tip/Problems-with-Java-modules-still-plague-developers

A good point of refence is running JanusGraph OLAP queries in the Gremlin Console using spark master = local[*], because an extensive test suite ran succesfully for each release. But practical projects, like yours, have other requirements than just served by Gremlin Conosole.

Best wishes,     Marc


Re: NoSuchMethodError

Joe Obernberger
 

Oh - my apologies.  I'm using 0.6, and Cassandra 4.04.

What I eventually did was get the source from github and compile with Cassandra 4.  I'm using spark 3.2.1.

-Joe

On 5/19/2022 1:50 AM, hadoopmarc@... wrote:
Hi Joe,

Your issue description is not complete. To start:
  • what version of JanusGraph do you use?
  • what spark master do you use?

The easiest way to find the guava version of JanusGraph is to download the zip distribution and check the lib folder.

Best wishes,    Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Re: NoSuchMethodError

Joe Obernberger
 

The answer to my problem was telling spark-submit to use the user classes first with:
--conf spark.executor.userClassPathFirst = true
--conf spark.driver.userClassPathFirst = true

-Joe

On 5/18/2022 9:57 AM, Joe Obernberger via lists.lfaidata.foundation wrote:

Hi All - I'm trying to write a spark job to load data from Cassandra in to a graph, but I'm getting this:

Exception in thread "main" java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object)'
        at org.janusgraph.diskstorage.configuration.ConfigElement.<init>(ConfigElement.java:38)
        at org.janusgraph.diskstorage.configuration.ConfigNamespace.<init>(ConfigNamespace.java:32)
        at org.janusgraph.diskstorage.configuration.ConfigNamespace.<init>(ConfigNamespace.java:37)
        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<clinit>(GraphDatabaseConfiguration.java:93)
        at org.janusgraph.core.JanusGraphFactory$Builder.open(JanusGraphFactory.java:275)

Which version of guava should I be using in the allinone jar? I've tried a many different once, but no luck.
Thanks!

-Joe

--
This email has been checked for viruses by AVG.
https://www.avg.com


Re: NoSuchMethodError

hadoopmarc@...
 

Hi Joe,

Your issue description is not complete. To start:
  • what version of JanusGraph do you use?
  • what spark master do you use?

The easiest way to find the guava version of JanusGraph is to download the zip distribution and check the lib folder.

Best wishes,    Marc


NoSuchMethodError

Joe Obernberger
 

Hi All - I'm trying to write a spark job to load data from Cassandra in to a graph, but I'm getting this:

Exception in thread "main" java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object)'
        at org.janusgraph.diskstorage.configuration.ConfigElement.<init>(ConfigElement.java:38)
        at org.janusgraph.diskstorage.configuration.ConfigNamespace.<init>(ConfigNamespace.java:32)
        at org.janusgraph.diskstorage.configuration.ConfigNamespace.<init>(ConfigNamespace.java:37)
        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<clinit>(GraphDatabaseConfiguration.java:93)
        at org.janusgraph.core.JanusGraphFactory$Builder.open(JanusGraphFactory.java:275)

Which version of guava should I be using in the allinone jar? I've tried a many different once, but no luck.
Thanks!

-Joe


--
This email has been checked for viruses by AVG.
https://www.avg.com


Re: [DISCUSS] Moving from Gitter to Discord

nidhi.vinaykiya27@...
 

I agree. Moving to discord will be much better.


[DISCUSS] Moving from Gitter to Discord

Florian Hockmann
 

Hi,

JanusGraph currently uses Gitter as its chat platform. We noticed however that Discord is becoming more and more popular for OSS communities. TinkerPop has also recently started a Discord server which is seeing a lot of activity and where also frequently questions about JanusGraph are asked (and answered).

So, we are currently evaluating whether it makes sense for JanusGraph to also move to Discord. Therefore, it’s important for us to know what you think:

Would you welcome the migration to Discord in general?

Would you rather use Discord than Gitter?

Or would you prefer if we stay on Gitter?

 

Let us know what you think.

 

Regards,

Florian

 


Re: Rapid deletion of vertices

Scott Friedman
 

Thanks Boxuan, Marc, and Eric.

I implemented Boxuan's get-vertex-ids-and-delete-in-parallel suggestion with 8 gremlinpython workers, and it saves an order of magnitude of time.  I imagine it could scale up further with more parallelism.  That's some great time savings, thank you!

Marc, good idea to snapshot and then do full down-and-up.  I assume we'd have to take down Cassandra and Elasticsearch as well, and then start their docker images back up with substituted volumes.  This would obviously be outperform for millions/billions of vertices.

Eric, it sounds like your approach may while keeping the data store (e.g., Cassandra) and indexer (e.g., Elasticsearch) alive, which could improve efficiency over a full tear-down.  We'll consider this as well, probably identifying some docker-based analogs for some of the janusgraph shell commands.  Thanks!


Re: Rapid deletion of vertices

eric.neufeld@...
 

Hello Scott,

i had the same situation but with much more data. Fastest way was stopping the server, then clear all, start it again and create the schema again.

bin/janusgraph.sh stop
bin/janusgraph.sh clear
bin/janusgraph.sh start
bin/gremlin.sh -i scripts/init-myGraph.groovy

Of course these steps could added to some sh script like resetjanusgraph.sh.


In init-myGraph.groovy  i added something like:

:remote connect tinkerpop.server conf/.....-remote.yaml
:remote console
:load data/gravity-my-schema.groovy
defineMySchema(graph)
:q
 

In data/gravity-my-schema.groovy there i define that groovy function defineMySchema(graph)


//#!/usr/bin/env groovy
 
def defineWafermapSchema(graph) {
 
    // Create graph schema and indexes, if they haven't already been created
    m = graph.openManagement()
    
    println 'setup schema'
 
    if (m.getPropertyKey('name') == null) {
 
      la= m.makeVertexLabel("la").make()

....

Maybe this helps,
Eric


Re: Rapid deletion of vertices

hadoopmarc@...
 

Hi Scott,

Another approach is to take snapshots of the Cassandra tables and Elasticsearch indices after creating the schema and indices.

Note that there are some subtleties when taking snapshots of non-empty graphs (not your present use case), see:

https://lists.lfaidata.foundation/g/janusgraph-users/topic/82475527#5867

Best wishes,    Marc


Re: Rapid deletion of vertices

Boxuan Li
 

Hi Scott,

One idea that first came into my mind is to first collect all vertex ids, and then delete them in batch & in parallel using multi-threading.

Best regards,
Boxuan

On May 5, 2022, at 5:57 PM, Scott Friedman <friedman@...> wrote:

Good afternoon,

We're running a docker-compose janusgraph:0.6.1 with cassandra:3 and elasticsearch:6.6.0.  We're primarily utilizing JanusGraph within Python 3.8 via gremlinpython.

We frequently reset our graph store to run an experiment or demonstration.  To date, we've either (1) dropped the graph and re-loaded our schema and re-defined our indices or (2) deleted all the vertices to maintain the schema and indices.  Often #2 is faster (and less error-prone), but it's slower for large graphs.  I hope somebody can lend some advice that will speed up our resettting-the-graph workflow with JanusGraph.

For deleting 6K nodes (and many incident edges), here's the timing data:

2022-05-05 16:40:44,261 - INFO - Deleting batch 1.

2022-05-05 16:41:09,961 - INFO - Deleting batch 2.

2022-05-05 16:41:27,689 - INFO - Deleting batch 3.

2022-05-05 16:41:43,678 - INFO - Deleting batch 4.

2022-05-05 16:41:45,561 - INFO - Deleted 6226 vertices over 4 batch(es).


...so it takes roughly 1 minute to delete 6K vertices in batches of 2000.

Here's our Python code for deleting the nodes:

        batches = 0
        nodes = 0
        while True:
            batches += 1
            com.log(f'Deleting batch {batches}.')
            num_nodes = g.V().limit(batch_size).sideEffect(__.drop()).count().next()
            nodes += num_nodes
            if num_nodes < batch_size:
                break
        log(f'Deleted {nodes} nodes over {batches} batch(es).')

This never fails, but it's obviously quite slow, especially for larger graphs.  Is there a way to speed this up?  We haven't tried running it async, since we're not sure how to do so safely.

Thanks in advance for any wisdom!

Scott


Rapid deletion of vertices

Scott Friedman
 

Good afternoon,

We're running a docker-compose janusgraph:0.6.1 with cassandra:3 and elasticsearch:6.6.0.  We're primarily utilizing JanusGraph within Python 3.8 via gremlinpython.

We frequently reset our graph store to run an experiment or demonstration.  To date, we've either (1) dropped the graph and re-loaded our schema and re-defined our indices or (2) deleted all the vertices to maintain the schema and indices.  Often #2 is faster (and less error-prone), but it's slower for large graphs.  I hope somebody can lend some advice that will speed up our resettting-the-graph workflow with JanusGraph.

For deleting 6K nodes (and many incident edges), here's the timing data:

2022-05-05 16:40:44,261 - INFO - Deleting batch 1.

2022-05-05 16:41:09,961 - INFO - Deleting batch 2.

2022-05-05 16:41:27,689 - INFO - Deleting batch 3.

2022-05-05 16:41:43,678 - INFO - Deleting batch 4.

2022-05-05 16:41:45,561 - INFO - Deleted 6226 vertices over 4 batch(es).


...so it takes roughly 1 minute to delete 6K vertices in batches of 2000.

Here's our Python code for deleting the nodes:

        batches = 0
        nodes = 0
        while True:
            batches += 1
            com.log(f'Deleting batch {batches}.')
            num_nodes = g.V().limit(batch_size).sideEffect(__.drop()).count().next()
            nodes += num_nodes
            if num_nodes < batch_size:
                break
        log(f'Deleted {nodes} nodes over {batches} batch(es).')

This never fails, but it's obviously quite slow, especially for larger graphs.  Is there a way to speed this up?  We haven't tried running it async, since we're not sure how to do so safely.

Thanks in advance for any wisdom!

Scott


Re: Log4j Vulnerability for janusgraph

Vinayak Bali
 

Hi, 

Try updating the jar file, I mean deleting the earlier one and placing the new one in lib folder. There might be some other changes required not sure. I was working with janusgraph before not shifted to other graph. I hope this helps.

Thanks and Regards,
Vinayak

On Wed, 27 Apr 2022, 10:37 pm , <nidhi.vinaykiya27@...> wrote:
@ronnie i agree. Patch release would help. But we are not sure when that’s going to come. Do we have any other alternative?


Re: Log4j Vulnerability for janusgraph

nidhi.vinaykiya27@...
 

@ronnie i agree. Patch release would help. But we are not sure when that’s going to come. Do we have any other alternative?


Re: Log4j Vulnerability for janusgraph

nidhi.vinaykiya27@...
 

Even the latest release janusgraph 0.6.1 used log4j 1.2.x . How were you able to fix it? 

81 - 100 of 6588