Date   

Traversal binding of dynamically created graphs are not propagated in multi-node cluster

Anton Eroshenko <erosh.anton@...>
 

Hi
We use dynamically created graphs in a multi-node JanusGraph cluster. With a single JunusGraph node it seems to work, but when we are using more than one, synchronization between JanusGraph nodes doesn't work, gremlin server on some nodes does not recognize newly created graph traversal. 
Documentation page says that with a maximum of a 20s lag for the binding to take effect on any node in the cluster, but in fact the new traversal is binded only on the node we did request to, not on the others, no matter how long you wait. So it looks like a bug. 
We're creating a new graph with 
ConfiguredGraphFactory.create(graphName)
It is created successfully, but not propagated to other nodes. 

As a workaround I'm calling ConfiguredGraphFactory.open(graphName) on an unsynced instance, but it is not reliable since from Java application you don't know what instance you will be redirected to by LB. 

I attached a docker-compose file with which it can be reproduced. There are two JanusGraph instances, they expose different ports. But be aware that two JanusGraph instances starting up at the same time result in concurrency error on one of the nodes, another issue of multi-node configuration. So I simply stop one of the containers on start-up and restart it later. 


Re: Query not returning always the same result

hadoopmarc@...
 

Hi Adrian,

What happens if you rewrite the query to:

lmg.traversal().V(analysisVertex).out().emit().repeat(
                __.in().choose(
                        __.hasLabel("result"),
                        __.has("analysisId", analysisId),
                        __.identity()
                )
        ).tree().next().getTreesAtDepth(3);

I do not understand how leaving out the else clause leads to the random behavior you describe, but it won't hurt to state the intended else clause explicitly. If the else clause is not a valid case in your data model, you do not need the choose() step.

Best wishes,   Marc


Query not returning always the same result

Adrián Abalde Méndez <aabalde@...>
 

Hello,

I'm having a strange behaviour with janusgraph and I would like to post it here and see if anyone can give me some help.

The thing is that I'm doing a tree query for getting my graph data structured as a tree, and from there build the results I'm interested in. This query works fine, but the problem is that I don't get the same results every time. It doesn't have any sense that, if the graph is the same and hasn't changed, the query returns different trees, does it?

Both trees I'm getting are not very different between them. We have a node type called "group", and some other nodes hanging from this "groups" called "results", and is just that some times the tree comes with the results and others not, but it has always the "group" structure.

In case you want to know it, the query I'm performing is this one:


lmg.traversal().V(analysisVertex).out().emit().repeat(
                __.in().choose(
                        __.label().is(P.eq("result")),
                        __.where(__.has("analysisId", analysisId))
                )
        ).tree().next().getTreesAtDepth(3);


where starting from an "analysis" node, I filter the graph to just have a tree with the groups and the results with the analysisId I'm interested in.

I guess that is not a problem of the query itself, because when it has the results, it works fine. But I don't know why I am getting this strange inconsistent behaviour.

Any ideas about this? Thanks in advance :)

Best regards,
Adrian


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

hadoopmarc@...
 

Hi,
You did not answer my questions about the "id" poperty. TinkerPop uses a Token.ID that has the value 'id', see:

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/T.java

I suspect that you ingested data without schema validation ("automatic schema creation"), that your input data contains an "id¨ property key and that JanusGraph/TinkerPop get confused about which id is what. So I strongly suggest that you make sure that this is not the root cause of this issue. To be sure, it would still be an issue but not for you anymore :-)

Best wishes,    Marc


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Another really strange observation

gremlin> g.V().has('id','131594d6a416666b401a9e48e54ebc8f22be75e2593c5d98e2d9ecfd719d5f29').has('type','email_sha256_lowercase').valueMap(true)
==>[dpts_678:[1595548800],label:vertex,id:201523209257056,id:[19df651e-90d5-47f6-af2e-35dcb59bcc0a],type:[id_mid_10],soft_del:[false],country_GBR:[678]]


Could you please have a look?


Re: Count Query Optimization

Boxuan Li
 

Have you tried keeping query.batch = true AND query.fast-property = true?

Regards,
Boxuan

On Mar 22, 2021, at 8:28 PM, Vinayak Bali <vinayakbali16@...> wrote:

Hi All,

Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 
1) Turn on query.batch
2) Turn off 
query.fast-property
Count query is faster but edge traversal becomes more expensive.
Is there any other way to improve count performance without affecting other queries.

Thanks & Regards,
Vinayak

On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya








Re: Count Query Optimization

Vinayak Bali
 

Hi All,

Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 
1) Turn on query.batch
2) Turn off 
query.fast-property
Count query is faster but edge traversal becomes more expensive.
Is there any other way to improve count performance without affecting other queries.

Thanks & Regards,
Vinayak

On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Hi

The issue still persists, and the vertex metadata is still missing for some vertices, after enabling https://docs.janusgraph.org/advanced-topics/eventual-consistency/, has someone seen the same issue.

The issue is logged at https://github.com/JanusGraph/janusgraph/issues/2515

Thanks


Re: Janusgraph 0.5.3 potential memory leak

Oleksandr Porunov
 
Edited

Opened the issue about this potential bug here: https://github.com/JanusGraph/janusgraph/issues/2524


Re: ScriptExecutor Deprecated but Used in gremlin.bat

hadoopmarc@...
 

Hi Fredrick,

You are right, this is an issue, so if you want to report this: thanks.

Best wishes,    Marc


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Duplicate Vertex

kumkar.dev@...
 

Hi Boxuan Li,

Hope this helps:
---------------------------------------------------------------------------------------------------
Vertex Index Name              | Type        | Unique    | Backing        | Key:           Status |
---------------------------------------------------------------------------------------------------
by_prop1                       | Composite   | false     | internalindex  | prop1:        ENABLED |
by_prop2                       | Composite   | false     | internalindex  | prop2  :      ENABLED |

- Dev


Re: Duplicate Vertex

Boxuan Li
 

Hi, can you share more details (what indexes do you have related to prop1 and/or prop2), or even minimal code to reproduce? 

On Mar 19, 2021, at 12:32 AM, kumkar.dev@... wrote:

Hello

We are on Janus 0.4.0 and faced one scenario wherein there were duplicate vertices created. 
These 2 vertices were created in span of 9 milliseconds within single transaction.
We are using index for looking up V in the graph.

The vertex is identified by 2 identifiers/properties prop1, prop2 and there are other properties.
There are property matches to check if the vertex is already present then accordingly create or update the vertex. 

There are two property matches to check for vertex existence.
  1. Match1 = prop1
  2. Match2 = prop1 OR prop2

The first vertex got created with property match, t1
  • prop1='value-foo'
The second vertex got created with property match, t1+9 milliseconds
  • prop1='value-foo' OR prop2='value-bar'
The second instance was not able to find there is a vertex with prop1='value-foo' already created before 9 milliseconds.
Could this be issue not able to read in-memory cache? Are there known issues in this area where index is being returned resulting into this issue? 

Thanks
Dev


Duplicate Vertex

kumkar.dev@...
 
Edited

Hello

We are on Janus 0.4.0 and faced one scenario wherein there were duplicate vertices created. 
These 2 vertices were created in span of 9 milliseconds within single transaction.
We are using index for looking up V in the graph.

The vertex is identified by 2 identifiers/properties prop1, prop2 and there are other properties.
There are property matches to check if the vertex is already present then accordingly create or update the vertex. 

There are two property matches to check for vertex existence.
  1. Match1 = prop1
  2. Match2 = prop1 OR prop2

The first vertex got created with property match, t1
  • prop1='value-foo'
The second vertex got created with property match, t1+9 milliseconds
  • prop1='value-foo' OR prop2='value-bar'
The second instance was not able to find there is a vertex with prop1='value-foo' already created before 9 milliseconds.
Could this be issue not able to read in-memory cache? Are there known issues in this area where index is not being returned resulting into this issue? 

Thanks
Dev


Re: How to circumvent transaction cache?

timon.schneider@...
 

Thanks for your thoughts.
1) I'm very interested to try out the PR you made for this issue.
2) I don't think the solution you gave me in that previous thread solves the issue. What if another user sets version_v.published to true between step 3 and 4. This is allowed even with the ConsistencyModifier.LOCK on the vertex and properties of version_v.

1. start_transaction();
2. read_vertex(type_v);
3. read_vertex(version_v); // type_v ——hasVersion—> version_v
4. if (version_v.published == true) then abort();
5. update_vertex(type_v);
6. update_vertex(version_v); // set version_v.published = true
7. commit();


Re: Count Query Optimization

Vinayak Bali
 

Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Count Query Optimization

Vinayak Bali
 

Hi Amiya,

With dedup:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())
Output: ==>[edgeCount:200166,vertexCount:34693]

without dedup:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().count())
Output: ==>[edgeCount:200166,vertexCount:400332]

Both queries are taking approx 3 sec to run.

Query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count())
Output: ==>[vetexCount:383633,edgeCount:200166]
Time: 3.5 mins

Edge Count is the same for all the queries but getting different vertexCount. Which one is the right vertex count??

Thanks & Regards,
Vinayak


On Thu, Mar 18, 2021 at 11:18 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

May be try below.

g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())    // I do not think dedup is required for your use case, can try both with and without dedup

Regards, Amiya


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

May be try below.

g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())    // I do not think dedup is required for your use case, can try both with and without dedup

Regards, Amiya


Re: Janusgraph - OLAP using Dataproc

kndoan94@...
 

Hi Claire! 

Would you mind sharing the pom.xml file for your build? I'm trying a similar build for AWS and am hitting a mess of dependency errors.

Thank you :)
Ben

881 - 900 of 6661