Date   

Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

hadoopmarc@...
 

Hi,
You did not answer my questions about the "id" poperty. TinkerPop uses a Token.ID that has the value 'id', see:

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/T.java

I suspect that you ingested data without schema validation ("automatic schema creation"), that your input data contains an "id¨ property key and that JanusGraph/TinkerPop get confused about which id is what. So I strongly suggest that you make sure that this is not the root cause of this issue. To be sure, it would still be an issue but not for you anymore :-)

Best wishes,    Marc


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Another really strange observation

gremlin> g.V().has('id','131594d6a416666b401a9e48e54ebc8f22be75e2593c5d98e2d9ecfd719d5f29').has('type','email_sha256_lowercase').valueMap(true)
==>[dpts_678:[1595548800],label:vertex,id:201523209257056,id:[19df651e-90d5-47f6-af2e-35dcb59bcc0a],type:[id_mid_10],soft_del:[false],country_GBR:[678]]


Could you please have a look?


Re: Count Query Optimization

Boxuan Li
 

Have you tried keeping query.batch = true AND query.fast-property = true?

Regards,
Boxuan

On Mar 22, 2021, at 8:28 PM, Vinayak Bali <vinayakbali16@...> wrote:

Hi All,

Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 
1) Turn on query.batch
2) Turn off 
query.fast-property
Count query is faster but edge traversal becomes more expensive.
Is there any other way to improve count performance without affecting other queries.

Thanks & Regards,
Vinayak

On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya








Re: Count Query Optimization

Vinayak Bali
 

Hi All,

Adding these properties in the configuration file affects edge traversal. Retrieving a single edge takes 7 mins of time. 
1) Turn on query.batch
2) Turn off 
query.fast-property
Count query is faster but edge traversal becomes more expensive.
Is there any other way to improve count performance without affecting other queries.

Thanks & Regards,
Vinayak

On Fri, Mar 19, 2021 at 1:53 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Multiple vertices generated for the same index value and vertex properties missing with RF3

sauverma
 

Hi

The issue still persists, and the vertex metadata is still missing for some vertices, after enabling https://docs.janusgraph.org/advanced-topics/eventual-consistency/, has someone seen the same issue.

The issue is logged at https://github.com/JanusGraph/janusgraph/issues/2515

Thanks


Re: Janusgraph 0.5.3 potential memory leak

Oleksandr Porunov
 
Edited

Opened the issue about this potential bug here: https://github.com/JanusGraph/janusgraph/issues/2524


Re: ScriptExecutor Deprecated but Used in gremlin.bat

hadoopmarc@...
 

Hi Fredrick,

You are right, this is an issue, so if you want to report this: thanks.

Best wishes,    Marc


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

Try below. If it works for you, you can add E2 and D similarly.

g.V().has('property1', 'A').
   outE().has('property1', 'E').as('e').
   inV().has('property1', 'B').
   outE().has('property1', 'E1').as('e').
   where (inV().has('property1', 'C')).
 select (all, 'e').fold().
    project('edgeCount', 'vertexCount').
            by(count(local)).
        by(unfold().bothV().dedup().count())

Regards,
Amiya

On Thu, 18 Mar 2021, 15:47 Vinayak Bali, <vinayakbali16@...> wrote:
Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Duplicate Vertex

kumkar.dev@...
 

Hi Boxuan Li,

Hope this helps:
---------------------------------------------------------------------------------------------------
Vertex Index Name              | Type        | Unique    | Backing        | Key:           Status |
---------------------------------------------------------------------------------------------------
by_prop1                       | Composite   | false     | internalindex  | prop1:        ENABLED |
by_prop2                       | Composite   | false     | internalindex  | prop2  :      ENABLED |

- Dev


Re: Duplicate Vertex

Boxuan Li
 

Hi, can you share more details (what indexes do you have related to prop1 and/or prop2), or even minimal code to reproduce? 

On Mar 19, 2021, at 12:32 AM, kumkar.dev@... wrote:

Hello

We are on Janus 0.4.0 and faced one scenario wherein there were duplicate vertices created. 
These 2 vertices were created in span of 9 milliseconds within single transaction.
We are using index for looking up V in the graph.

The vertex is identified by 2 identifiers/properties prop1, prop2 and there are other properties.
There are property matches to check if the vertex is already present then accordingly create or update the vertex. 

There are two property matches to check for vertex existence.
  1. Match1 = prop1
  2. Match2 = prop1 OR prop2

The first vertex got created with property match, t1
  • prop1='value-foo'
The second vertex got created with property match, t1+9 milliseconds
  • prop1='value-foo' OR prop2='value-bar'
The second instance was not able to find there is a vertex with prop1='value-foo' already created before 9 milliseconds.
Could this be issue not able to read in-memory cache? Are there known issues in this area where index is being returned resulting into this issue? 

Thanks
Dev


Duplicate Vertex

kumkar.dev@...
 
Edited

Hello

We are on Janus 0.4.0 and faced one scenario wherein there were duplicate vertices created. 
These 2 vertices were created in span of 9 milliseconds within single transaction.
We are using index for looking up V in the graph.

The vertex is identified by 2 identifiers/properties prop1, prop2 and there are other properties.
There are property matches to check if the vertex is already present then accordingly create or update the vertex. 

There are two property matches to check for vertex existence.
  1. Match1 = prop1
  2. Match2 = prop1 OR prop2

The first vertex got created with property match, t1
  • prop1='value-foo'
The second vertex got created with property match, t1+9 milliseconds
  • prop1='value-foo' OR prop2='value-bar'
The second instance was not able to find there is a vertex with prop1='value-foo' already created before 9 milliseconds.
Could this be issue not able to read in-memory cache? Are there known issues in this area where index is not being returned resulting into this issue? 

Thanks
Dev


Re: How to circumvent transaction cache?

timon.schneider@...
 

Thanks for your thoughts.
1) I'm very interested to try out the PR you made for this issue.
2) I don't think the solution you gave me in that previous thread solves the issue. What if another user sets version_v.published to true between step 3 and 4. This is allowed even with the ConsistencyModifier.LOCK on the vertex and properties of version_v.

1. start_transaction();
2. read_vertex(type_v);
3. read_vertex(version_v); // type_v ——hasVersion—> version_v
4. if (version_v.published == true) then abort();
5. update_vertex(type_v);
6. update_vertex(version_v); // set version_v.published = true
7. commit();


Re: Count Query Optimization

Vinayak Bali
 

Amiya - I need to check the data, there is some mismatch with the counts.

Consider we have more than one relation to get the count. How can we modify the query?

For example:
 
A->E->B query is as follows:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())

A->E->B->E1->C->E2->D

What changes can be made in the query ??

Thanks



On Thu, Mar 18, 2021 at 1:59 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

Correct vertex count is ( 400332 non-unique, 34693 unique).

g.V().has('property1', 'A').aggregate('v'), all the vertex having property1 = A  might be getting included in count in your second query because of eager evaluation (does not matter they  have outE with property1 = E or not)

Regards,
Amiya


Re: Count Query Optimization

Vinayak Bali
 

Hi Amiya,

With dedup:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())
Output: ==>[edgeCount:200166,vertexCount:34693]

without dedup:
g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().count())
Output: ==>[edgeCount:200166,vertexCount:400332]

Both queries are taking approx 3 sec to run.

Query: g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').aggregate('e').inV().has('property1', 'B').aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count())
Output: ==>[vetexCount:383633,edgeCount:200166]
Time: 3.5 mins

Edge Count is the same for all the queries but getting different vertexCount. Which one is the right vertex count??

Thanks & Regards,
Vinayak


On Thu, Mar 18, 2021 at 11:18 AM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

May be try below.

g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())    // I do not think dedup is required for your use case, can try both with and without dedup

Regards, Amiya


Re: Count Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

May be try below.

g.V().has('property1', 'A').
   outE().has('property1','E').
       where(inV().has('property1', 'B')). fold().
   project('edgeCount', 'vertexCount').
            by(count(local)).
            by(unfold().bothV().dedup().count())    // I do not think dedup is required for your use case, can try both with and without dedup

Regards, Amiya


Re: Janusgraph - OLAP using Dataproc

kndoan94@...
 

Hi Claire! 

Would you mind sharing the pom.xml file for your build? I'm trying a similar build for AWS and am hitting a mess of dependency errors.

Thank you :)
Ben


Re: Caused by: org.janusgraph.core.JanusGraphException: A JanusGraph graph with the same instance id [0a000439355-0b2b58ca5c222] is already open. Might required forced shutdown.

hadoopmarc@...
 

Hi Srinivas,

In the yaml file determining class Settings you use the channelizer key twice. If you use ConfigurationManagentGraph only the following line should be present:
channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer

Does that make any difference? Does stil the part ¨with one argument of class Settings,¨ show up in the ERROR message then?

Best wishes,    Marc


Re: Count Query Optimization

hadoopmarc@...
 

Hi Vinayak,

Another attempt, this one is very similar to the one that works.

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties')
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin> GraphOfTheGodsFactory.loadWithoutMixedIndex(graph,true)
==>null

gremlin> g.V().as('v1').outE().as('e').inV().as('v2').union(select('v1'), select('v2')).dedup().count()
16:12:39 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>12

gremlin> g.V().as('v1').outE().as('e').inV().as('v2').select('e').dedup().count()
16:15:30 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>17

gremlin> g.V().as('v1').outE().as('e').inV().as('v2').union(
......1>     union(select('v1'), select('v2')).dedup().count(),
......2>     select('e').dedup().count().as('ecount')
......3>     )
16:27:42 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>12
==>17
   
Best wishes,    Marc


Re: Count Query Optimization

Nicolas Trangosi
 

Hi,
You may try to use denormalization by setting property1 from inV also on edge. 
Then once edges are updated, following query should work:

g.V().has('property1', 'A').aggregate('v').outE().has('property1','E').has('inVproperty1', 'B').aggregate('e').inV().aggregate('v').select('v').dedup().as('vetexCount').select('e').dedup().as('edgeCount').select('vetexCount','edgeCount').by(unfold().count())


Le mer. 17 mars 2021 à 14:05, Vinayak Bali <vinayakbali16@...> a écrit :
Hi Marc,

Using local returns the output after each count. For example:

==>[vetexCount:184439,edgeCount:972]
==>[vetexCount:184440,edgeCount:973]
==>[vetexCount:184441,edgeCount:974]
==>[vetexCount:184442,edgeCount:975]
==>[vetexCount:184443,edgeCount:976]
==>[vetexCount:184444,edgeCount:977]
==>[vetexCount:184445,edgeCount:978]
==>[vetexCount:184446,edgeCount:979]
==>[vetexCount:184447,edgeCount:980]
==>[vetexCount:184448,edgeCount:981]
==>[vetexCount:184449,edgeCount:982]
==>[vetexCount:184450,edgeCount:983]
==>[vetexCount:184451,edgeCount:984]
==>[vetexCount:184452,edgeCount:985]
==>[vetexCount:184453,edgeCount:986]
==>[vetexCount:184454,edgeCount:987]
==>[vetexCount:184455,edgeCount:988]
==>[vetexCount:184456,edgeCount:989]
==>[vetexCount:184457,edgeCount:990]
==>[vetexCount:184458,edgeCount:991]
==>[vetexCount:184459,edgeCount:992]
==>[vetexCount:184460,edgeCount:993]
==>[vetexCount:184461,edgeCount:994]
==>[vetexCount:184462,edgeCount:995]
==>[vetexCount:184463,edgeCount:996]
==>[vetexCount:184464,edgeCount:997]
==>[vetexCount:184465,edgeCount:998]

You can suggest some other approach too. I really need it working.

Thanks & Regards,
Vinayak

On Wed, Mar 17, 2021 at 5:54 PM <hadoopmarc@...> wrote:
Hi Vinayak,

Referring to you last post, what happens if you use aggregate(local, 'v') and aggregate(local, 'e'). The local modifier makes the aggregate() step lazy, which hopefully gives janusgraph more opportunity to batch the storage backend requests.
https://tinkerpop.apache.org/docs/current/reference/#store-step

Best wishes,    Marc



--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   




Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.

901 - 920 of 6678