Re: Getting Edges - Performance
It's usually small; typically around 3 edges.
Reading your article
(https://li-boxuan.medium.com/janusgraph-deep-dive-part-3-speed-up-edge-queries-3b9eb5ba34f8).
Outbound edges could be large - 100s to 10000s.
-Joe
On 6/22/2022 4:49 PM, Boxuan Li wrote:
When it takes over 3 seconds to return, how many edges does it
return?
Hi All - I'm seeing a performance
issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return.
What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Re: Getting Edges - Performance
When it takes over 3 seconds to return, how many edges does it return?
toggle quoted message
Show quoted text
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation
<joseph.obernberger=gmail.com@...>
Sent: Wednesday, June 22, 2022 1:06 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: [janusgraph-users] Getting Edges - Performance
Hi All - I'm seeing a performance issue with this statement in Java code:
Iterator<Edge> edgeIt = vert.edges(Direction.IN);
in some cases this is taking over 3 seconds to return. What can I do to
better this performance?
Thank you!
-Joe
--
This email has been checked for viruses by AVG.
https://www.avg.com
|
|
Getting Edges - Performance
Hi All - I'm seeing a performance issue with this statement in Java code: Iterator<Edge> edgeIt = vert.edges(Direction.IN); in some cases this is taking over 3 seconds to return. What can I do to better this performance? Thank you! -Joe -- This email has been checked for viruses by AVG. https://www.avg.com
|
|
Re: Threaded Operations - Quarkus
Thanks for all the help on this. I'm coming closer to a solution
thanks to you all.
Question - I've been using GraphTraversalSource to do all the
adding vertices and edges to my graph. Example:
GraphTraversalSource traversal =
JanusGraph.tx().createThreadedTx().traversal();
Is it better to use JanusGraph.tx().createdThreadedTx() directly?
-Joe
On 6/17/2022 3:03 PM, Boxuan Li wrote:
toggle quoted message
Show quoted text
Yeah using `newTransaction()` won't make
a difference in your use case. Based on your input, there are
a couple of things you could try:
- As suggested by Kevin, you could use
locking. See https://docs.janusgraph.org/advanced-topics/eventual-consistency/#data-consistency.
It is slow but it will hopefully solve most race
conditions you have. Based on my understanding of
Cassandra's nature, I think you could still see such
inconsistencies but the chance is much lower for sure.
- You could periodically identify and remove the
inconsistencies using an offline pipeline.
- You could use an external locking service on client side.
For example, using Redis to make sure a conflicting
transaction won't start at the first place.
These solutions have their own pros & cons, so it
really depends on you.
Best,
Boxuan
So - unsurprisingly, Boxuan is correct.
Code like this:
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
try {
datasourceVertex =
traversal.V().has("someID", id).next();
} catch (java.util.NoSuchElementException nse)
{
datasourceVertex =
traversal.addV("source").property("someID", id).next();
}
being called from multiple threads results in several
vertices with the same 'someID'.
Not sure how to fix this.
-Joe
On 6/17/2022 10:28 AM, Joe
Obernberger via lists.lfaidata.foundation wrote:
Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate
nodes/edges; basically the graph becomes invalid.
Yes. This is a good idea. I could write a spark job to
examine the graph and fix up discrepancies. Smart.
Not sure what a locking services is? Example?
My current plan (not tested yet) is to use a static class
that contains the JanusGraph 'graph'. On Quarkus when a
REST call comes in, a new thread is created. That thread
will use Marc's idea of
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it
will retry X times.
At least that's my current plan. Not sure if it will work.
-Joe
On 6/17/2022 8:52 AM, Boxuan Li
wrote:
Hi Joe,
Unfortunately the way Marc suggests
won’t help with your usecase. Tbh I would have
suggested the same answer as Marc before I saw your
second post. If one has one JVM thread handling
multiple transactions (not familiar with quarkus so
not sure if that is possible), then one has to do
what Marc suggested. But in your usecase, it won't
be any different from your current usage because
JanusGraph will automatically create threaded
transaction for each thread (using ThreadLocal) when
you use the traversal object.
The real issue in your use case is that
you want ACID support, which really depends on your
backend storage. At least in our officially
supported Cassandra, HBase, and BigTable adapters,
this is not (yet) supported.
There are a few workarounds, though. Before discussing
that further, I would like to ask a few questions:
- What is your backend storage and is it
distributed?
- How often does this “race condition”
happen? Is it very rare or it’s fairly common?
- What is your end goal? Do you want to reduce the
chance of this “race condition”, or you want to
make sure this does not happen at all?
- Are you willing to resolve such duplicate
vertices/edges at either read time or offline?
- Are you willing to introduce a third dependency,
e.g. a distributed locking service?
Best,
Boxuan
Thank you Marc. I'm currently doing everything with a
traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just
to be clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new
vertices and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the
same attributes/name has vertices from Thread 2, but I
only want to have one vertex if it's going to have the
same attributes.
If Thread 1 adds that vertex before it does a commit,
then Thread 2, when it looks up said vertex won't find
it; so it will also add it.
Code example (traversal is a GraphTraversalSource
gotten from JanusGraphFactory.traversal())
try {
correlationVertex =
traversal.V().has("correlationID",
correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex =
traversal.addV("correlation").property("correlationID",
correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling
these requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank
you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests
from different client threads should be handled
independently, that is in different JanusGraph
Transactions?
In that case, I think you want to use a
GraphTraversalSource per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for viruses by AVG
antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
Yeah using `newTransaction()`
won't make a difference in your use case. Based on your input, there are a couple of things you could try:
- As suggested by Kevin, you could use locking. See https://docs.janusgraph.org/advanced-topics/eventual-consistency/#data-consistency.
It is slow but it will hopefully solve most race conditions you have. Based on my understanding of Cassandra's nature, I think you could still see such inconsistencies but the chance is much lower for sure.
- You could periodically identify and remove the inconsistencies using an offline pipeline.
- You could use an external locking service on client side. For example, using Redis to make sure a conflicting transaction won't start at the first place.
These solutions have their own pros & cons, so it really depends on you.
Best,
Boxuan
toggle quoted message
Show quoted text
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation
<joseph.obernberger=gmail.com@...>
Sent: Friday, June 17, 2022 2:24 PM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Threaded Operations - Quarkus
So - unsurprisingly, Boxuan is correct.
Code like this:
GraphTraversalSource traversal = StaticInfo.getGraph().newTransaction().traversal();
try {
datasourceVertex = traversal.V().has("someID", id).next();
} catch (java.util.NoSuchElementException nse) {
datasourceVertex = traversal.addV("source").property("someID", id).next();
}
being called from multiple threads results in several vertices with the same 'someID'.
Not sure how to fix this.
-Joe
On 6/17/2022 10:28 AM, Joe Obernberger via lists.lfaidata.foundation wrote:
Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate nodes/edges; basically the graph becomes invalid.
Yes. This is a good idea. I could write a spark job to examine the graph and fix up discrepancies. Smart.
Not sure what a locking services is? Example?
My current plan (not tested yet) is to use a static class that contains the JanusGraph 'graph'. On Quarkus when a REST call comes in, a new thread is created. That thread will use Marc's idea of
GraphTraversalSource traversal = StaticInfo.getGraph().newTransaction().traversal();
Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it will retry X times.
At least that's my current plan. Not sure if it will work.
-Joe
On 6/17/2022 8:52 AM, Boxuan Li wrote:
Hi Joe,
Unfortunately the way Marc suggests won’t help with your usecase. Tbh I would have suggested the same answer as Marc before I saw your second post. If one has one JVM thread handling multiple transactions (not familiar with quarkus so not sure
if that is possible), then one has to do what Marc suggested. But in your usecase, it won't be any different from your current usage because JanusGraph will automatically create threaded transaction for each thread (using ThreadLocal) when you use the traversal
object.
The real issue in your use case is that you want ACID support, which really depends on your backend storage. At least in our officially supported Cassandra, HBase, and BigTable adapters, this is not (yet) supported.
There are a few workarounds, though. Before discussing that further, I would like to ask a few questions:
- What is your backend storage and is it distributed?
- How often does this “race condition” happen? Is it very rare or it’s fairly common?
- What is your end goal? Do you want to reduce the chance of this “race condition”, or you want to make sure this does not happen at all?
- Are you willing to resolve such duplicate vertices/edges at either read time or offline?
- Are you willing to introduce a third dependency, e.g. a distributed locking service?
Best,
Boxuan
Thank you Marc. I'm currently doing everything with a traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new vertices and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the same attributes/name has vertices from Thread 2, but I only want to have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread 2, when it looks up said vertex won't find it; so it will also add it.
Code example (traversal is a GraphTraversalSource gotten from JanusGraphFactory.traversal())
try {
correlationVertex = traversal.V().has("correlationID", correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex = traversal.addV("correlation").property("correlationID", correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling these requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
This from Titan, the predecessor to Janusgraph, but see https://groups.google.com/g/aureliusgraphs/c/z6kyGSlifXE/m/aLc2Zwb_BAAJ for some experience with a similar issue.
You can either do locking and be slow and incur any other downsides of that, particularly if you want to do this across JVMs, or accept that you will have some (small?) risk of duplicates and deal with it in your traversals or do something periodically to identify and remove them.
toggle quoted message
Show quoted text
So - unsurprisingly, Boxuan is correct.
Code like this:
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
try {
datasourceVertex = traversal.V().has("someID",
id).next();
} catch (java.util.NoSuchElementException nse) {
datasourceVertex =
traversal.addV("source").property("someID", id).next();
}
being called from multiple threads results in several vertices
with the same 'someID'.
Not sure how to fix this.
-Joe
On 6/17/2022 10:28 AM, Joe Obernberger
via lists.lfaidata.foundation wrote:
Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate
nodes/edges; basically the graph becomes invalid.
Yes. This is a good idea. I could write a spark job to examine
the graph and fix up discrepancies. Smart.
Not sure what a locking services is? Example?
My current plan (not tested yet) is to use a static class that
contains the JanusGraph 'graph'. On Quarkus when a REST call
comes in, a new thread is created. That thread will use Marc's
idea of
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it will
retry X times.
At least that's my current plan. Not sure if it will work.
-Joe
On 6/17/2022 8:52 AM, Boxuan Li
wrote:
Hi Joe,
Unfortunately the way Marc suggests won’t
help with your usecase. Tbh I would have suggested the
same answer as Marc before I saw your second post. If
one has one JVM thread handling multiple transactions
(not familiar with quarkus so not sure if that is
possible), then one has to do what Marc suggested. But
in your usecase, it won't be any different from your
current usage because JanusGraph will automatically
create threaded transaction for each thread (using
ThreadLocal) when you use the traversal object.
The real issue in your use case is that you
want ACID support, which really depends on your backend
storage. At least in our officially supported Cassandra,
HBase, and BigTable adapters, this is not (yet)
supported.
There are a few workarounds, though. Before discussing
that further, I would like to ask a few questions:
- What is your backend storage and is it
distributed?
- How often does this “race condition” happen?
Is it very rare or it’s fairly common?
- What is your end goal? Do you want to reduce the
chance of this “race condition”, or you want to make
sure this does not happen at all?
- Are you willing to resolve such duplicate
vertices/edges at either read time or offline?
- Are you willing to introduce a third dependency,
e.g. a distributed locking service?
Best,
Boxuan
Thank you Marc. I'm currently doing everything with a
traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to
be clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new
vertices and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the
same attributes/name has vertices from Thread 2, but I only
want to have one vertex if it's going to have the same
attributes.
If Thread 1 adds that vertex before it does a commit, then
Thread 2, when it looks up said vertex won't find it; so it
will also add it.
Code example (traversal is a GraphTraversalSource gotten
from JanusGraphFactory.traversal())
try {
correlationVertex =
traversal.V().has("correlationID", correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex =
traversal.addV("correlation").property("correlationID",
correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling these
requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests from
different client threads should be handled independently,
that is in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource
per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for
viruses by AVG antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
So - unsurprisingly, Boxuan is correct.
Code like this:
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
try {
datasourceVertex = traversal.V().has("someID",
id).next();
} catch (java.util.NoSuchElementException nse) {
datasourceVertex =
traversal.addV("source").property("someID", id).next();
}
being called from multiple threads results in several vertices
with the same 'someID'.
Not sure how to fix this.
-Joe
On 6/17/2022 10:28 AM, Joe Obernberger
via lists.lfaidata.foundation wrote:
toggle quoted message
Show quoted text
Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate
nodes/edges; basically the graph becomes invalid.
Yes. This is a good idea. I could write a spark job to examine
the graph and fix up discrepancies. Smart.
Not sure what a locking services is? Example?
My current plan (not tested yet) is to use a static class that
contains the JanusGraph 'graph'. On Quarkus when a REST call
comes in, a new thread is created. That thread will use Marc's
idea of
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it will
retry X times.
At least that's my current plan. Not sure if it will work.
-Joe
On 6/17/2022 8:52 AM, Boxuan Li
wrote:
Hi Joe,
Unfortunately the way Marc suggests won’t
help with your usecase. Tbh I would have suggested the
same answer as Marc before I saw your second post. If
one has one JVM thread handling multiple transactions
(not familiar with quarkus so not sure if that is
possible), then one has to do what Marc suggested. But
in your usecase, it won't be any different from your
current usage because JanusGraph will automatically
create threaded transaction for each thread (using
ThreadLocal) when you use the traversal object.
The real issue in your use case is that you
want ACID support, which really depends on your backend
storage. At least in our officially supported Cassandra,
HBase, and BigTable adapters, this is not (yet)
supported.
There are a few workarounds, though. Before discussing
that further, I would like to ask a few questions:
- What is your backend storage and is it
distributed?
- How often does this “race condition” happen?
Is it very rare or it’s fairly common?
- What is your end goal? Do you want to reduce the
chance of this “race condition”, or you want to make
sure this does not happen at all?
- Are you willing to resolve such duplicate
vertices/edges at either read time or offline?
- Are you willing to introduce a third dependency,
e.g. a distributed locking service?
Best,
Boxuan
Thank you Marc. I'm currently doing everything with a
traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to
be clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new
vertices and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the
same attributes/name has vertices from Thread 2, but I only
want to have one vertex if it's going to have the same
attributes.
If Thread 1 adds that vertex before it does a commit, then
Thread 2, when it looks up said vertex won't find it; so it
will also add it.
Code example (traversal is a GraphTraversalSource gotten
from JanusGraphFactory.traversal())
try {
correlationVertex =
traversal.V().has("correlationID", correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex =
traversal.addV("correlation").property("correlationID",
correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling these
requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests from
different client threads should be handled independently,
that is in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource
per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for
viruses by AVG antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
Good stuff - thank you Boxuan.
Backend is Cassandra running on bare metal on 15 nodes.
Race condition is rare.
When the race condition happens, I'm seeing duplicate nodes/edges;
basically the graph becomes invalid.
Yes. This is a good idea. I could write a spark job to examine
the graph and fix up discrepancies. Smart.
Not sure what a locking services is? Example?
My current plan (not tested yet) is to use a static class that
contains the JanusGraph 'graph'. On Quarkus when a REST call
comes in, a new thread is created. That thread will use Marc's
idea of
GraphTraversalSource traversal =
StaticInfo.getGraph().newTransaction().traversal();
Do stuff and then traversal.tx().commit();
That will be done in a loop so that if the commit fails, it will
retry X times.
At least that's my current plan. Not sure if it will work.
-Joe
On 6/17/2022 8:52 AM, Boxuan Li wrote:
toggle quoted message
Show quoted text
Hi Joe,
Unfortunately the way Marc suggests won’t
help with your usecase. Tbh I would have suggested the
same answer as Marc before I saw your second post. If one
has one JVM thread handling multiple transactions (not
familiar with quarkus so not sure if that is possible),
then one has to do what Marc suggested. But in your
usecase, it won't be any different from your current usage
because JanusGraph will automatically create threaded
transaction for each thread (using ThreadLocal) when you
use the traversal object.
The real issue in your use case is that you
want ACID support, which really depends on your backend
storage. At least in our officially supported Cassandra,
HBase, and BigTable adapters, this is not (yet) supported.
There are a few workarounds, though. Before discussing that
further, I would like to ask a few questions:
- What is your backend storage and is it
distributed?
- How often does this “race condition” happen? Is
it very rare or it’s fairly common?
- What is your end goal? Do you want to reduce the
chance of this “race condition”, or you want to make
sure this does not happen at all?
- Are you willing to resolve such duplicate
vertices/edges at either read time or offline?
- Are you willing to introduce a third dependency, e.g.
a distributed locking service?
Best,
Boxuan
Thank you Marc. I'm currently doing everything with a
traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be
clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new
vertices and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the same
attributes/name has vertices from Thread 2, but I only want to
have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then
Thread 2, when it looks up said vertex won't find it; so it
will also add it.
Code example (traversal is a GraphTraversalSource gotten from
JanusGraphFactory.traversal())
try {
correlationVertex =
traversal.V().has("correlationID", correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex =
traversal.addV("correlation").property("correlationID",
correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling these
requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests from
different client threads should be handled independently, that
is in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource
per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for viruses by AVG
antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
Hi Joe,
Unfortunately the way Marc suggests won’t help with your usecase. Tbh I would have suggested the same answer as Marc before I saw your second post. If one has one JVM thread handling multiple transactions (not familiar with quarkus so not sure
if that is possible), then one has to do what Marc suggested. But in your usecase, it won't be any different from your current usage because JanusGraph will automatically create threaded transaction for each thread (using ThreadLocal) when you use the traversal
object.
The real issue in your use case is that you want ACID support, which really depends on your backend storage. At least in our officially supported Cassandra, HBase, and BigTable adapters, this is not (yet) supported.
There are a few workarounds, though. Before discussing that further, I would like to ask a few questions:
- What is your backend storage and is it distributed?
- How often does this “race condition” happen? Is it very rare or it’s fairly common?
- What is your end goal? Do you want to reduce the chance of this “race condition”, or you want to make sure this does not happen at all?
- Are you willing to resolve such duplicate vertices/edges at either read time or offline?
- Are you willing to introduce a third dependency, e.g. a distributed locking service?
Best,
Boxuan
toggle quoted message
Show quoted text
From: janusgraph-users@... <janusgraph-users@...> on behalf of Joe Obernberger via lists.lfaidata.foundation
<joseph.obernberger=gmail.com@...>
Sent: Friday, June 17, 2022 8:12:04 AM
To: janusgraph-users@... <janusgraph-users@...>
Subject: Re: [janusgraph-users] Threaded Operations - Quarkus
Thank you Marc. I'm currently doing everything with a traversal, and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new vertices and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the same attributes/name has vertices from Thread 2, but I only want to have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread 2, when it looks up said vertex won't find it; so it will also add it.
Code example (traversal is a GraphTraversalSource gotten from JanusGraphFactory.traversal())
try {
correlationVertex = traversal.V().has("correlationID", correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex = traversal.addV("correlation").property("correlationID", correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge = traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling these requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
Thank you Marc. I'm currently doing everything with a traversal,
and then doing a traversal.tx().commit()
Sounds like what you suggested is what I want, but just to be
clear:
Here's what I'm trying to do.
Thread 1/JVM1 gets a request that requires adding new vertices
and edges to the graph.
Thread 2/JVM1 gets a similar request.
Some of the vertices added in Thread 1 end up having the same
attributes/name has vertices from Thread 2, but I only want to
have one vertex if it's going to have the same attributes.
If Thread 1 adds that vertex before it does a commit, then Thread
2, when it looks up said vertex won't find it; so it will also add
it.
Code example (traversal is a GraphTraversalSource gotten from
JanusGraphFactory.traversal())
try {
correlationVertex = traversal.V().has("correlationID",
correlationID).next();
} catch (java.util.NoSuchElementException nse) {
correlationVertex = null;
}
.
.
.
if (correlationVertex == null) {
correlationVertex =
traversal.addV("correlation").property("correlationID",
correlationID).next();
correlationVertex.property("a", blah1);
correlationVertex.property("b", blah2);
}
I do similar things with edges:
try {
dataSourceToCorrelationEdge =
traversal.E().has("edgeID", edgeID).next();
} catch (NoSuchElementException nse) {
dataSourceToCorrelationEdge = null;
}
Ultimately, I'd like to have several JVMs handling these
requests; each which runs multiple threads.
I'll look at using a new transaction per call. Thank you!
-Joe
Hi Joe,
Do you mean with threadsafe transactions that requests from
different client threads should be handled independently, that is
in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource per
request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
Re: Threaded Operations - Quarkus
Hi Joe,
Do you mean with threadsafe transactions that requests from different client threads should be handled independently, that is in different JanusGraph Transactions?
In that case, I think you want to use a GraphTraversalSource per request like this:
g = graph.newTransaction().traversal()
Best wishes, Marc
|
|
Threaded Operations - Quarkus
Hi All - building a REST
service using Quarkus to handle requests that operate on a graph.
The current approach is:
Static class that contains the JanusGraph and GraphTraversalSource
objects that are created once per VM. Use those objects when a
request comes into add vertices, edges, properties, and when
completed, commit.
Since quarkus can be called via multiple threads, what is the best
approach to make sure the transactions are thread safe? I'm
looking here ( https://docs.janusgraph.org/interactions/transactions/),
but not sure of the best approach.
Thank you!
-Joe
|
This email has been checked for viruses by AVG antivirus software.
www.avg.com
|
|
|
JanusGraph Discord Server
Hi, We have created a Discord Server for JanusGraph to better support interactive conversations. So, if you would like to talk with other users and contributors of JanusGraph, then you can use Discord for that from now on. Please join the server via this invite link: https://discord.gg/5n4fjv4QAf Regards, Florian
|
|
[ANNOUNCE] JanusGraph 0.6.2 Release

Oleksandr Porunov
The JanusGraph Technical Steering Committee is excited to announce the release of JanusGraph 0.6.2. JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors.
|
|
Re: [Tech Blog] JanusGraph Deep Dive: Optimize Edge Queries
Great stuff! Is is really motivating to gain a much deeper understanding of the JanusGraph inner workings and get practical advice at the same time.
Marc
|
|
[Tech Blog] JanusGraph Deep Dive: Optimize Edge Queries
I just wrote a blog post to explain the internals of edges in JanusGraph and give a few examples of how to optimize edge queries. Here is the medium blog post: JanusGraph Deep Dive (Part 3): Optimize Edge QueriesCheck it out if you are interested in 1) how JanusGraph stores edge, 2) how JanusGraph handles edge query with predicate pushdowns 3) how Vertex-Centric Index works! I am happy to answer any questions here too. Best regards, Boxuan
|
|
Re: Logging vertex program
Hi Marc,
Exactly suprised to see logs in stderr.
Thanks, Nikita
toggle quoted message
Show quoted text
Hi Nikita,
Do you use the spark web UI? In the executor tab you can follow the stderr link and see any logged or printed output. No idea why they use stderr.
Best wishes, Marc
|
|
Re: Logging vertex program
Hi Nikita,
Do you use the spark web UI? In the executor tab you can follow the stderr link and see any logged or printed output. No idea why they use stderr.
Best wishes, Marc
|
|
Re: upgrade gremlin version
Hi Senthilkumar,
I remember the Gremlin.version() output of 1.2.1 in the gremlin console of the janusgraph distribution is a bug somewhere. You can look in the lib folder and see that janusgraph-0.6.1 uses gremlin-3.5.1. gremlin-3.6.x will become available in a later janusgraph release. It is not easy or advisable to try and upgrade the gremlin version yourself.
Best wishes, Marc
|
|
Janusgraph latest version running on 1.2 version. But Gremlin latest version is 3.6. How to upgrade Germlin latest version? -- Senthilkumar
|
|