Date   

Re: ID block allocation exception while creating edge

anjanisingh22@...
 

We have 250 parallel spark task running for creating node/edges.
I didn't get parallel tasks -  (for setting ids.num-partitions)? Could you please help me on it?


Re: ID block allocation exception while creating edge

hadoopmarc@...
 

What is the number of parallel tasks?  (for setting ids.num-partitions)

You have the ids.authority.wait-time still on its default value of 300 ms, so that seems worthwhile experimenting with.

Best wishes,    Marc


Re: ID block allocation exception while creating edge

anjanisingh22@...
 

On Tue, May 11, 2021 at 11:54 AM, <hadoopmarc@...> wrote:
https://docs.janusgraph.org/advanced-topics/bulk-loading/#optimizing-id-allocation

Thanks for response Marc. Below is the method i am using to create janus connection:

public JanusGraph createJanusConnection(HashMap<String, Object> janusConfig) {

    JanusGraphFactory.Builder configProps = JanusGraphFactory.build();

  configProps.set(GREMLIN_GRAPH, org.janusgraph.core.JanusGraphFactory”);

    configProps.set(STORAGE_BACKEND, cql”);

    configProps.set(STORAGE_HOSTNAME, janusConfig.get("storage.hostname"));

    configProps.set(STORAGE_CQL_KEYSPACE, janusConfig.get("storage.keyspace"));

    configProps.set(CACHE_DB_CACHE, false”);

    configProps.set(CACHE_DB_CACHE_SIZE, “0.5”);

    configProps.set(CACHE_DB_CACHE_TIME, 180000”);

    configProps.set(CACHE_DB_CACHE_CLEAN_WAIT, “20”);

    configProps.set(STORAGE_CQL_LOCAL_DATACENTER, janusConfig.get("local-datacenter"));

     configProps.set(STORAGE_CQL_WRITE_CONSISTENCY_LEVEL, LOCAL_ONE”);

    configProps.set(STORAGE_CQL_READ_CONSISTENCY_LEVEL, LOCAL_ONE”);

    configProps.set(STORAGE_CQL_SSL_ENABLED, janusConfig.get("cql.ssl.enabled"));

    configProps.set(STORAGE_CQL_SSL_TRUSTSTORE_LOCATION, janusConfig.get("truststore.location"));

    configProps.set(STORAGE_CQL_SSL_TRUSTSTORE_PASSWORD, janusConfig.get("truststore.password"));

    configProps.set(STORAGE_USERNAME, janusConfig.get("cassandra.username"));

    configProps.set(STORAGE_PASSWORD, janusConfig.get("cassandra.password"));

    configProps.set("storage.read-time", "120000");

    configProps.set("storage.write-time", "120000");

    configProps.set("storage.connection-timeout", "120000");

 

    // added to fix ID block allocation exceptions

    configProps.set("renew-timeout", "240000");

    configProps.set("write-time", "1000");

    configProps.set("read-time", "100");

    configProps.set("renew-percentage", "0.4");

 

    configProps.set(METRICS_ENABLED, true”);

    configProps.set(METRICS_JMX_ENABLED, true”);

    configProps.set(INDEX_SEARCH_BACKEND, elasticsearch”);

    configProps.set(INDEX_SEARCH_HOSTNAME, janusConfig.get("elasticsearch.hostname"));

   configProps.set(INDEX_SEARCH_ELASTICSEARCH_HTTP_AUTH_TYPE,”basic”);

    }

    configProps.set(INDEX_SEARCH_ELASTICSEARCH_HTTP_AUTH_BASIC_USERNAME, janusConfig.get("elasticsearch.username"));

    configProps.set(INDEX_SEARCH_ELASTICSEARCH_HTTP_AUTH_BASIC_PASSWORD, janusConfig.get("elasticsearch.password"));

    configProps.set(INDEX_SEARCH_ELASTICSEARCH_SSL_ENABLED, janusConfig.get("elasticsearch.ssl.enabled")

    );

    configProps.set(IDS_BLOCK_SIZE, 1000000000”);

    configProps.set(IDS_RENEW_PERCENTAGE, “0.3”);

    logger.info("JanusGraph config initialization!!");

    return configProps.open();

}


Re: ID block allocation exception while creating edge

hadoopmarc@...
 

Hi Anjani,

Please show the properties file you use to open janusgraph.
I assume you also saw the other recommendations in https://docs.janusgraph.org/advanced-topics/bulk-loading/#optimizing-id-allocation

Best wishes,   Marc


ID block allocation exception while creating edge

anjanisingh22@...
 

Hi All,

I am creating vertex and edges in bulk and getting below error while creating edge. Below is the exception log:

Cluster edge creation failed between guest node 20369408030929128 and identifier node 16891904008515712. Exception : org.janusgraph.core.JanusGraphException: ID block allocation on partition(29)-namespace(3) failed with an exception in 12.23 ms

I tired increasing value of "ids.block-size" but still no luck, even seto 1B also for testing purpose but still no luck, getting above error. I am creating around 4 - 5M nodes per hours.

 

Could you please share some pointers to fix it? Appreciate your help and time.

Thanks,
Anjani


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

Actually, query 4 was easier to rework. It could read somewhat like:
g.V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge1').limit(100).as('e').inV().has('property1', 'vertex1').as('v2').
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge2').limit(100).as('e').inV().has('property1', 'vertex2').as('v2').
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  V().has('property1', 'vertex3').as('v1').outE().has('property1', 'edge3').limit(100).as('e').inV().has('property1', 'vertex2').as('v2').
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  V().has('property1', 'vertex3').as('v1').outE().has('property1', 'Component_Of').limit(100).as('e').inV().has('property1', 'vertex1').as('v2')).
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  cap('x')

Best wishes,    Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc,

Thank you for your reply. I will try to report this issue on janusgraph repository. Regarding the work around you suggested, if possible please share the updated query with work around for query1. That will be helpful for me to replicate the same. 

Thank & Regards,

Vinayak

On Mon, 10 May 2021, 6:03 pm , <hadoopmarc@...> wrote:
Hi Vinayak,

If you would bother to demonstrate this behavior with a reproducible, generated graph, you can report it as an issue on github.

For now, you can only look for workarounds:
 - combine the four clauses outside of gremlin
 - try g.V()......................fold().V()......................fold().V().......................fold().V()......................... instead of the union, although I am not sure janusgraph will use the index for the repeated V() steps. The fold() steps ensure that the V() steps are run exactly once.

Best wishes,    Marc


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

If you would bother to demonstrate this behavior with a reproducible, generated graph, you can report it as an issue on github.

For now, you can only look for workarounds:
 - combine the four clauses outside of gremlin
 - try g.V()......................fold().V()......................fold().V().......................fold().V()......................... instead of the union, although I am not sure janusgraph will use the index for the repeated V() steps. The fold() steps ensure that the V() steps are run exactly once.

Best wishes,    Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc, 

This query takes 18 sec to run by changing as to aggregate and select to project. But still, 99% of the time is taken to compute union. There is no memory issue, it already set to 8g.

g.inject(1).union(V().has('property1', 'vertex1').aggregate('v1').union(outE().has('property1', 'edge1').aggregate('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').aggregate('e').inV().has('property1', 'vertex2')).aggregate('v2'),V().has('property1', 'vertex3').aggregate('v1').union(outE().has('property1', 'edge3').aggregate('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').aggregate('e').inV().has('property1', 'vertex1')).aggregate('v2')).limit(100).project('v1','e','v2').by(valueMap().by(unfold()))

Also, this has the same effect as removing the inner union step to separate ones.

Thanks & Regards,
Vinayak

On Mon, May 10, 2021 at 11:45 AM <hadoopmarc@...> wrote:
Hi Vinayak,

Your last remark explains it well: it seems that in JanusGraph a union of multiple clauses can take much longer than the sum of the individual clauses. There are still two things that we have not ruled out:

  • the repetition of as('v1') is unusual. Can you try what happens if you use the aggegate('v1')..............cap('v1', e, 'v2') mechanism instead? Or, simpler, what happens if you use neither the as() nor the aggregate() steps, omitting the formatting of the output?
  • are you sure there are no memory constraints, even if this seems unlikely given the limit(100) steps applied. You can check by increasing memory for gremlin console:
    export JAVA_OPTIONS="-Xmx4g"
Best wishes,    Marc


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

Your last remark explains it well: it seems that in JanusGraph a union of multiple clauses can take much longer than the sum of the individual clauses. There are still two things that we have not ruled out:

  • the repetition of as('v1') is unusual. Can you try what happens if you use the aggegate('v1')..............cap('v1', e, 'v2') mechanism instead? Or, simpler, what happens if you use neither the as() nor the aggregate() steps, omitting the formatting of the output?
  • are you sure there are no memory constraints, even if this seems unlikely given the limit(100) steps applied. You can check by increasing memory for gremlin console:
    export JAVA_OPTIONS="-Xmx4g"
Best wishes,    Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc,

That works as expected. Union also works as expected as in Query1 but when I add limit to all edge the performance degrades. 

Thanks 

On Sat, 8 May 2021, 8:16 pm , <hadoopmarc@...> wrote:
Hi Vinayak,

What happens with a single clause, so without the union:

g.V().has('property1', 'vertex3').outE().has('property1', 'edge3').inV().has('property1', 'vertex2').limit(100).path().toList()

Best wishes,    Marc


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

What happens with a single clause, so without the union:

g.V().has('property1', 'vertex3').outE().has('property1', 'edge3').inV().has('property1', 'vertex2').limit(100).path().toList()

Best wishes,    Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc, 

Yes, all the index are made available and no warning is thrown while executing the query. I tried debugging using profile step.  99% of time is taken by the union query. 

Thanks & Regards,
Vinayak

On Sat, 8 May 2021, 7:24 pm , <hadoopmarc@...> wrote:
Hi Vinayak,

To be sure: we are dealing here with a large graph, so all V().has('property1', 'vertex...') steps do hit the index (no index log warnings)? For one, it would be interesting to see the output of the .profile() step.

My earlier suggestion did not make much sense as it limited the inV() step() not a full in() step.

Indeed, retrieving just about 50 vertices should return within a second.

Best wishes,   Marc


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

To be sure: we are dealing here with a large graph, so all V().has('property1', 'vertex...') steps do hit the index (no index log warnings)? For one, it would be interesting to see the output of the .profile() step.

My earlier suggestion did not make much sense as it limited the inV() step() not a full in() step.

Indeed, retrieving just about 50 vertices should return within a second.

Best wishes,   Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc, 

Tried the approach you suggested. There is some improvement. Earlier it took 2 mins, now it's taking 1min 50sec. Is there any other way to optimize this further may to ms or seconds??

Thank & Regards,
Vinayak

On Sat, May 8, 2021 at 6:34 PM <hadoopmarc@...> wrote:
Hi Vinayak,

My answer already contains a concrete suggestion. Replace all union subclauses starting with outE with the alternate form that has a local(................limit1)) construct, as indicated.

Marc


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

My answer already contains a concrete suggestion. Replace all union subclauses starting with outE with the alternate form that has a local(................limit1)) construct, as indicated.

Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc,

Thank you for your reply. I understand the queries are big, so there is a problem viewing them. 

Actually I am not interested in either of v1 or v2. I want to apply limit on edges, and don't care how many v1 or v2 will returned, with least time taken. 

Query 1 is the main query, and I want to apply limit on all edges. Other queries are for reference which I have tried by just small changes. Some work but time taken is more. 

Please suggest a optimised way in which I can apply limit on edges or may be v1 and v2 if required in future, which will take less time. 

Thanks & Regards,
Vinayak

On Sat, 8 May 2021, 5:38 pm , <hadoopmarc@...> wrote:
Hi Vinayak,

Can you please try and format your code in a consistent way to ease the reading (even if the editor in this forum is not really helpful in this)? After manual reformatting, Query 1 and the first Query 3 are identical, so I stopped looking at the other queries after that.

I have one suggestion though. If you take one of the subclauses:
    outE().has('property1', 'edge1').limit(100).as('e').inV().has('property1', 'vertex1')

you do not seem interested in returning all v2 vertices. You can therefore limit the number of v2 vertices with:
    outE().has('property1', 'edge1').limit(100).as('e').local(inV().has('property1', 'vertex1').limit(1))

Also see: https://tinkerpop.apache.org/docs/current/reference/#local-step

Best wishes,   Marc


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

Can you please try and format your code in a consistent way to ease the reading (even if the editor in this forum is not really helpful in this)? After manual reformatting, Query 1 and the first Query 3 are identical, so I stopped looking at the other queries after that.

I have one suggestion though. If you take one of the subclauses:
    outE().has('property1', 'edge1').limit(100).as('e').inV().has('property1', 'vertex1')

you do not seem interested in returning all v2 vertices. You can therefore limit the number of v2 vertices with:
    outE().has('property1', 'edge1').limit(100).as('e').local(inV().has('property1', 'vertex1').limit(1))

Also see: https://tinkerpop.apache.org/docs/current/reference/#local-step

Best wishes,   Marc


Re: Not able to run queries using spark graph computer from java

hadoopmarc@...
 

Hi Sai,

The blog you mentioned is a bit outdated and  is for spark-1.x. To get an idea of what changes are needed to get OLAP running with spark-2.x, you can take a look at:
https://tinkerpop.apache.org/docs/current/recipes/#olap-spark-yarn

Best wishes,    Marc


Query Optimisation

Vinayak Bali
 

Hi All, 

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2')).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1')).as('v2')).limit(100).select('v1','e','v2').by(valueMap().by(unfold()))

This query is returning 100 results of the form (v1,e,v2) and the time taken is in milliseconds.

 Rather than returning 100 results of (v1,e,v2) form, need to return 100 edges of each type. The query is as follows: 

//Query1

// 2mins vertex1:77, edge1: 36, edge2: 5, vertex2: 105, vertex3: 100, edge3: 100

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').limit(100).as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').limit(100).as('e').inV().has('property1', 'vertex2')).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').limit(100).as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').limit(100).as('e').inV().has('property1', 'vertex1')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

But this takes 2 mins to execute which is not optimal. Tried some other approaches.

//Query2

// 2mins vertex1:77, edge1: 36, edge2: 5, vertex2: 105, vertex3: 100, edge3: 100

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1').limit(100),outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2').limit(100)).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2').limit(100),outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1').limit(100)).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

//Query3

// 529 ms vertex1:77, edge1: 36, edge2: 5, vertex2: 105, vertex3: 100, edge3: 100

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2')).limit(100).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1')).limit(100).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

//Query3

// 18 sec vertex1:77, edge1: 36, edge2: 5, vertex2: 105, vertex3: 100, edge3: 100

g.inject(1).union(V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1').
limit(100).as('v2'),V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2').
limit(100).as('v2'),V().has('property1', 'vertex3').as('v1').outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2').
limit(100).as('v2'),V().has('property1', 'vertex3').as('v1').outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1').
limit(100).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

//Query4

// 18 sec vertex1:77, edge1: 36, edge2: 5, vertex2: 105, vertex3: 100, edge3: 100

g.inject(1).union(V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge1').limit(100).as('e').
inV().has('property1', 'vertex1').as('v2'),V().has('property1', 'vertex1').as('v1').outE().
has('property1', 'edge2').limit(100).as('e').inV().has('property1', 'vertex2').as('v2'),V().has('property1', 'vertex3').
as('v1').outE().has('property1', 'edge3').limit(100).as('e').inV().has('property1', 'vertex2').as('v2'),
V().has('property1', 'vertex3').as('v1').outE().has('property1', 'Component_Of').
limit(100).as('e').inV().has('property1', 'vertex1').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Query3 performs better, but when the limit changes it doesn't return the expected result as shown in the following queries: Query7 is equivalent to Query3, just limit is changed.

//Query5

// 2mins vertex1:25, edge1: 10, edge2: 5, vertex2: 15, vertex3: 10, edge3: 10

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').limit(10).as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').limit(10).as('e').inV().has('property1', 'vertex2')).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').limit(10).as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').limit(10).as('e').inV().has('property1', 'vertex1')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

//Query6

// 2mins vertex1:25, edge1: 10, edge2: 5, vertex2: 15, vertex3: 10, edge3: 10

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1').limit(10),outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2').limit(10)).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2').limit(10),outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1').limit(10)).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

//Query7

// 278 ms vertex1:18, edge1: 8, edge2: 2, vertex2: 12, vertex3: 10, edge3: 10

g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2')).limit(10).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1')).limit(10).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

//Query8

// 18 sec vertex1:25, edge1: 10, edge2: 5, vertex2: 15, vertex3: 10, edge3: 10

g.inject(1).union(V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1').
limit(10).as('v2'),V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2').
limit(10).as('v2'),V().has('property1', 'vertex3').as('v1').outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2').
limit(10).as('v2'),V().has('property1', 'vertex3').as('v1').outE().has('property1', 'Component_Of').as('e').
inV().has('property1', 'vertex1').limit(10).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Also, the limit doesn't affect the time taken to execute the query. It's constant for both the limits. 
Request you share your view and help me to solve the problem in an efficient way.

Thanks & Regards,
Vinayak

601 - 620 of 6499