Date   

Re: Threaded Operations - Quarkus

Joe Obernberger
 

So - unsurprisingly, Boxuan is correct.
Code like this:
GraphTraversalSource traversal = StaticInfo.getGraph().newTransaction().traversal();
try {
                    datasourceVertex = traversal.V().has("someID", id).next();
                } catch (java.util.NoSuchElementException nse) {
                    datasourceVertex = traversal.addV("source").property("someID", id).next();
                }

being called from multiple threads results in several vertices with the same 'someID'.

Not sure how to fix this.

-Joe

On 6/17/2022 10:28 AM, Joe Obernberger via lists.lfaidata.foundation wrote:

Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate nodes/edges; basically the graph becomes invalid.
Yes.  This is a good idea.  I could write a spark job to examine the graph and fix up discrepancies.  Smart.
Not sure what a locking services is?  Example?

My current plan (not tested yet) is to use a static class that contains the JanusGraph 'graph'.  On Quarkus when a REST call comes in, a new thread is created.  That thread will use Marc's idea of
GraphTraversalSource traversal = StaticInfo.getGraph().newTransaction().traversal();

Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it will retry X times.

At least that's my current plan.  Not sure if it will work.

-Joe

On 6/17/2022 8:52 AM, Boxuan Li wrote:
Hi Joe,

Unfortunately the way Marc suggests won’t help with your usecase. Tbh I would have suggested the same answer as Marc before I saw your second post. If one has one JVM thread handling multiple transactions (not familiar with quarkus so not sure if that is possible), then one has to do what Marc suggested. But in your usecase, it won't be any different from your current usage because JanusGraph will automatically create threaded transaction for each thread (using ThreadLocal) when you use the traversal object.

The real issue in your use case is that you want ACID support, which really depends on your backend storage. At least in our officially supported Cassandra, HBase, and BigTable adapters, this is not (yet) supported.

There are a few workarounds, though. Before discussing that further, I would like to ask a few questions:

  1. What is your backend storage and is it distributed?
  2. How often does this “race condition” happen? Is it very rare or it’s fairly common?
  3. What is your end goal? Do you want to reduce the chance of this “race condition”, or you want to make sure this does not happen at all?
  4. Are you willing to resolve such duplicate vertices/edges at either read time or offline?
  5. Are you willing to introduce a third dependency, e.g. a distributed locking service?

Best,
Boxuan

From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation <joseph.obernberger=gmail.com@...>
Sent: Friday, June 17, 2022 8:12:04 AM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Threaded Operations - Quarkus
 

Thank you Marc.  I'm currently doing everything with a traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be clear:
Here's what I'm trying to do.

Thread 1/JVM1 gets a request that requires adding new vertices and edges to the graph.
Thread 2/JVM1 gets a similar request. 
Some of the vertices added in Thread 1 end up having the same attributes/name has vertices from Thread 2, but I only want to have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread 2, when it looks up said vertex won't find it; so it will also add it.

Code example (traversal is a GraphTraversalSource gotten from JanusGraphFactory.traversal())

try {
            correlationVertex = traversal.V().has("correlationID", correlationID).next();
        } catch (java.util.NoSuchElementException nse) {
            correlationVertex = null;
        }

.
.
.

if (correlationVertex == null) {
            correlationVertex = traversal.addV("correlation").property("correlationID", correlationID).next();
            correlationVertex.property("a", blah1);
            correlationVertex.property("b", blah2);

        }

I do similar things with edges:

        try {
            dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
        } catch (NoSuchElementException nse) {
            dataSourceToCorrelationEdge = null;
        }

Ultimately, I'd like to have several JVMs handling these requests; each which runs multiple threads.
I'll look at using a new transaction per call.  Thank you!

-Joe

On 6/17/2022 8:01 AM, hadoopmarc@... wrote:
Hi Joe,

Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?

In that case, I think you want to use a GraphTraversalSource per request like this:

g = graph.newTransaction().traversal()

Best wishes,    Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Re: Threaded Operations - Quarkus

Joe Obernberger
 

Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate nodes/edges; basically the graph becomes invalid.
Yes.  This is a good idea.  I could write a spark job to examine the graph and fix up discrepancies.  Smart.
Not sure what a locking services is?  Example?

My current plan (not tested yet) is to use a static class that contains the JanusGraph 'graph'.  On Quarkus when a REST call comes in, a new thread is created.  That thread will use Marc's idea of
GraphTraversalSource traversal = StaticInfo.getGraph().newTransaction().traversal();

Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it will retry X times.

At least that's my current plan.  Not sure if it will work.

-Joe

On 6/17/2022 8:52 AM, Boxuan Li wrote:
Hi Joe,

Unfortunately the way Marc suggests won’t help with your usecase. Tbh I would have suggested the same answer as Marc before I saw your second post. If one has one JVM thread handling multiple transactions (not familiar with quarkus so not sure if that is possible), then one has to do what Marc suggested. But in your usecase, it won't be any different from your current usage because JanusGraph will automatically create threaded transaction for each thread (using ThreadLocal) when you use the traversal object.

The real issue in your use case is that you want ACID support, which really depends on your backend storage. At least in our officially supported Cassandra, HBase, and BigTable adapters, this is not (yet) supported.

There are a few workarounds, though. Before discussing that further, I would like to ask a few questions:

  1. What is your backend storage and is it distributed?
  2. How often does this “race condition” happen? Is it very rare or it’s fairly common?
  3. What is your end goal? Do you want to reduce the chance of this “race condition”, or you want to make sure this does not happen at all?
  4. Are you willing to resolve such duplicate vertices/edges at either read time or offline?
  5. Are you willing to introduce a third dependency, e.g. a distributed locking service?

Best,
Boxuan

From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation <joseph.obernberger=gmail.com@...>
Sent: Friday, June 17, 2022 8:12:04 AM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Threaded Operations - Quarkus
 

Thank you Marc.  I'm currently doing everything with a traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be clear:
Here's what I'm trying to do.

Thread 1/JVM1 gets a request that requires adding new vertices and edges to the graph.
Thread 2/JVM1 gets a similar request. 
Some of the vertices added in Thread 1 end up having the same attributes/name has vertices from Thread 2, but I only want to have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread 2, when it looks up said vertex won't find it; so it will also add it.

Code example (traversal is a GraphTraversalSource gotten from JanusGraphFactory.traversal())

try {
            correlationVertex = traversal.V().has("correlationID", correlationID).next();
        } catch (java.util.NoSuchElementException nse) {
            correlationVertex = null;
        }

.
.
.

if (correlationVertex == null) {
            correlationVertex = traversal.addV("correlation").property("correlationID", correlationID).next();
            correlationVertex.property("a", blah1);
            correlationVertex.property("b", blah2);

        }

I do similar things with edges:

        try {
            dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
        } catch (NoSuchElementException nse) {
            dataSourceToCorrelationEdge = null;
        }

Ultimately, I'd like to have several JVMs handling these requests; each which runs multiple threads.
I'll look at using a new transaction per call.  Thank you!

-Joe

On 6/17/2022 8:01 AM, hadoopmarc@... wrote:
Hi Joe,

Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?

In that case, I think you want to use a GraphTraversalSource per request like this:

g = graph.newTransaction().traversal()

Best wishes,    Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Re: Threaded Operations - Quarkus

Boxuan Li
 

Hi Joe,

Unfortunately the way Marc suggests won’t help with your usecase. Tbh I would have suggested the same answer as Marc before I saw your second post. If one has one JVM thread handling multiple transactions (not familiar with quarkus so not sure if that is possible), then one has to do what Marc suggested. But in your usecase, it won't be any different from your current usage because JanusGraph will automatically create threaded transaction for each thread (using ThreadLocal) when you use the traversal object.

The real issue in your use case is that you want ACID support, which really depends on your backend storage. At least in our officially supported Cassandra, HBase, and BigTable adapters, this is not (yet) supported.

There are a few workarounds, though. Before discussing that further, I would like to ask a few questions:

  1. What is your backend storage and is it distributed?
  2. How often does this “race condition” happen? Is it very rare or it’s fairly common?
  3. What is your end goal? Do you want to reduce the chance of this “race condition”, or you want to make sure this does not happen at all?
  4. Are you willing to resolve such duplicate vertices/edges at either read time or offline?
  5. Are you willing to introduce a third dependency, e.g. a distributed locking service?

Best,
Boxuan


From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation <joseph.obernberger=gmail.com@...>
Sent: Friday, June 17, 2022 8:12:04 AM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Threaded Operations - Quarkus
 

Thank you Marc.  I'm currently doing everything with a traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be clear:
Here's what I'm trying to do.

Thread 1/JVM1 gets a request that requires adding new vertices and edges to the graph.
Thread 2/JVM1 gets a similar request. 
Some of the vertices added in Thread 1 end up having the same attributes/name has vertices from Thread 2, but I only want to have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread 2, when it looks up said vertex won't find it; so it will also add it.

Code example (traversal is a GraphTraversalSource gotten from JanusGraphFactory.traversal())

try {
            correlationVertex = traversal.V().has("correlationID", correlationID).next();
        } catch (java.util.NoSuchElementException nse) {
            correlationVertex = null;
        }

.
.
.

if (correlationVertex == null) {
            correlationVertex = traversal.addV("correlation").property("correlationID", correlationID).next();
            correlationVertex.property("a", blah1);
            correlationVertex.property("b", blah2);

        }

I do similar things with edges:

        try {
            dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
        } catch (NoSuchElementException nse) {
            dataSourceToCorrelationEdge = null;
        }

Ultimately, I'd like to have several JVMs handling these requests; each which runs multiple threads.
I'll look at using a new transaction per call.  Thank you!

-Joe

On 6/17/2022 8:01 AM, hadoopmarc@... wrote:
Hi Joe,

Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?

In that case, I think you want to use a GraphTraversalSource per request like this:

g = graph.newTransaction().traversal()

Best wishes,    Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Re: Threaded Operations - Quarkus

Joe Obernberger
 

Thank you Marc.  I'm currently doing everything with a traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be clear:
Here's what I'm trying to do.

Thread 1/JVM1 gets a request that requires adding new vertices and edges to the graph.
Thread 2/JVM1 gets a similar request. 
Some of the vertices added in Thread 1 end up having the same attributes/name has vertices from Thread 2, but I only want to have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread 2, when it looks up said vertex won't find it; so it will also add it.

Code example (traversal is a GraphTraversalSource gotten from JanusGraphFactory.traversal())

try {
            correlationVertex = traversal.V().has("correlationID", correlationID).next();
        } catch (java.util.NoSuchElementException nse) {
            correlationVertex = null;
        }

.
.
.

if (correlationVertex == null) {
            correlationVertex = traversal.addV("correlation").property("correlationID", correlationID).next();
            correlationVertex.property("a", blah1);
            correlationVertex.property("b", blah2);

        }

I do similar things with edges:

        try {
            dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
        } catch (NoSuchElementException nse) {
            dataSourceToCorrelationEdge = null;
        }

Ultimately, I'd like to have several JVMs handling these requests; each which runs multiple threads.
I'll look at using a new transaction per call.  Thank you!

-Joe

On 6/17/2022 8:01 AM, hadoopmarc@... wrote:
Hi Joe,

Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?

In that case, I think you want to use a GraphTraversalSource per request like this:

g = graph.newTransaction().traversal()

Best wishes,    Marc



AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Re: Threaded Operations - Quarkus

hadoopmarc@...
 

Hi Joe,

Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?

In that case, I think you want to use a GraphTraversalSource per request like this:

g = graph.newTransaction().traversal()

Best wishes,    Marc


Threaded Operations - Quarkus

Joe Obernberger
 

Hi All - building a REST service using Quarkus to handle requests that operate on a graph.  The current approach is:

Static class that contains the JanusGraph and GraphTraversalSource objects that are created once per VM.  Use those objects when a request comes into add vertices, edges, properties, and when completed, commit.
Since quarkus can be called via multiple threads, what is the best approach to make sure the transactions are thread safe?  I'm looking here (https://docs.janusgraph.org/interactions/transactions/), but not sure of the best approach.
Thank you!

-Joe




AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



JanusGraph Discord Server

Florian Hockmann
 

Hi,

We have created a Discord Server for JanusGraph to better support interactive conversations. So, if you would like to talk with other users and contributors of JanusGraph, then you can use Discord for that from now on.

Please join the server via this invite link:

https://discord.gg/5n4fjv4QAf

 

Regards,

Florian


[ANNOUNCE] JanusGraph 0.6.2 Release

Oleksandr Porunov
 

The JanusGraph Technical Steering Committee is excited to announce the release of JanusGraph 0.6.2.

JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors.

The release artifacts can be found at this location:
    https://github.com/JanusGraph/janusgraph/releases/tag/v0.6.2

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.6.2/janusgraph-full-0.6.2.zip
 
A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.6.2/janusgraph-0.6.2.zip

The online docs can be found here:
    https://docs.janusgraph.org
 
To view the resolved issues and commits check the milestone here:
    https://github.com/JanusGraph/janusgraph/milestone/23?closed=1

Thank you very much,
Oleksandr Porunov
on behalf of JanusGraph TSC


Re: [Tech Blog] JanusGraph Deep Dive: Optimize Edge Queries

hadoopmarc@...
 


Great stuff! Is is really motivating to gain a much deeper understanding of the JanusGraph inner workings and get practical advice at the same time.

Marc


[Tech Blog] JanusGraph Deep Dive: Optimize Edge Queries

Boxuan Li
 

I just wrote a blog post to explain the internals of edges in JanusGraph and give a few examples of how to optimize edge queries. Here is the medium blog post: JanusGraph Deep Dive (Part 3): Optimize Edge Queries
Check it out if you are interested in 1) how JanusGraph stores edge, 2) how JanusGraph handles edge query with predicate pushdowns 3) how Vertex-Centric Index works! I am happy to answer any questions here too.

Best regards,
Boxuan


Re: Logging vertex program

Nikita Pande
 

Hi Marc,


Exactly suprised to see logs in stderr. 

Thanks,
Nikita

On Wed, 1 Jun, 2022, 9:04 pm , <hadoopmarc@...> wrote:
Hi Nikita,

Do you use the spark web UI? In the executor tab you can follow the stderr link and see any logged or printed output. No idea why they use stderr.

Best wishes,    Marc


Re: Logging vertex program

hadoopmarc@...
 

Hi Nikita,

Do you use the spark web UI? In the executor tab you can follow the stderr link and see any logged or printed output. No idea why they use stderr.

Best wishes,    Marc


Re: upgrade gremlin version

hadoopmarc@...
 

Hi Senthilkumar,

I remember the Gremlin.version() output of 1.2.1 in the gremlin console of the janusgraph distribution is a bug somewhere. You can look in the lib folder and see that janusgraph-0.6.1 uses gremlin-3.5.1. gremlin-3.6.x will become available in a later janusgraph release. It is not easy or advisable to try and upgrade the gremlin version yourself.

Best wishes,   Marc


upgrade gremlin version

senthilkmrr@...
 

 Janusgraph  latest version running on 1.2 version. But Gremlin latest version is 3.6. How to upgrade Germlin  latest version?
--
Senthilkumar


Logging vertex program

Nikita Pande
 

Hi team,

I am trying to add logs in a vertex program, but cant d=find them on spark executors or janusgraph server logs.
How to get vertex program logs?

Thanks and Regards,
Nikita


Re: Janusgraph cluster set-up

hadoopmarc@...
 
Edited

Hi Senthilkumar,

Sounds like the "getting started deployment scenario", see https://docs.janusgraph.org/operations/deployment/#getting-started-scenario

JanusGraph servers only need to have the exact same config and will cooperate automagically. Setting up the scylladb cluster is described elsewhere. Setting up load balancing/virtualIP is described elsewhere.

Best wishes,    Marc


Janusgraph cluster set-up

senthilkmrr@...
 

 Please let me know  How to create janusgraph multi node clustering set-up?    I using cassandra/scylladb for data storage.
--
Senthilkumar


New committer: Clement de Groc

Boxuan Li
 

On behalf of the JanusGraph Technical Steering Committee (TSC), I'm pleased to welcome a new committer to the project!

Clement de Groc has been a solid contributor. He already contributed many performance improvements, and bug fixes and provided many PR reviews.

Congratulations, Clement!


Re: Hbase read after write not working with janusgraph-0.6.1 but was working with janusgraph-0.6.1

hadoopmarc@...
 

Hi Nikita,

JanusGraph can be run as a server or in embedded mode. gremlin-server.yaml is for configuring JanusGraph Server. With JanusGraphFactory in the Gremlin Console you instantiate embedded JanusGraph. In the latter case you can query JanusGraph without a remote connection to JanusGraph Server.

See: https://docs.janusgraph.org/operations/deployment/

Marc


Re: Hbase read after write not working with janusgraph-0.6.1 but was working with janusgraph-0.6.1

Nikita Pande
 
Edited

Hi @hadoopmarc,

Is there a difference between configuring janusgraph server conf ie gremlin-server.yaml
graphs: {
  graph: /etc/opt/janusgraph/janusgraph.properties
   
   

vs configuring it from console for eg: graph2=GraphFactory.open("/etc/opt/janusgraph/janusgraph.properties")?

121 - 140 of 6651