Date   

Re: Configured graph factory not working after making changes to gremlin-server.yaml

Vinayak Bali
 

Hi,

Make changes in gremlin-server-cql-es.yaml file. 

Thanks 

On Wed, 28 Apr 2021, 11:52 pm Sai Supraj R, <suprajratakonda@...> wrote:
Hi,

0.5.3

Thanks
Sai

On Wed, Apr 28, 2021 at 2:21 PM Vinayak Bali <vinayakbali16@...> wrote:
Hi,

Which is the janusgraph version being used ???

Regards,
Vinayak

On Wed, 28 Apr 2021, 11:23 pm , <suprajratakonda@...> wrote:
I am trying to use configured graph factory. i made changes to gremlin-server.yaml and configuration-management.properties. I am getting the following error.

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a] - type ':remote console' to return to local mode
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> ConfiguredGraphFactory.open("ConfigurationManagementGraph");
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> ConfiguredGraphFactory.create("ConfigurationManagementGraph");
Please create a template Configuration using the ConfigurationManagementGraph#createTemplateConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N


Re: Configured graph factory not working after making changes to gremlin-server.yaml

Sai Supraj R
 

Hi,

0.5.3

Thanks
Sai

On Wed, Apr 28, 2021 at 2:21 PM Vinayak Bali <vinayakbali16@...> wrote:
Hi,

Which is the janusgraph version being used ???

Regards,
Vinayak

On Wed, 28 Apr 2021, 11:23 pm , <suprajratakonda@...> wrote:
I am trying to use configured graph factory. i made changes to gremlin-server.yaml and configuration-management.properties. I am getting the following error.

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a] - type ':remote console' to return to local mode
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> ConfiguredGraphFactory.open("ConfigurationManagementGraph");
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> ConfiguredGraphFactory.create("ConfigurationManagementGraph");
Please create a template Configuration using the ConfigurationManagementGraph#createTemplateConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N


Re: Configured graph factory not working after making changes to gremlin-server.yaml

Vinayak Bali
 

Hi,

Which is the janusgraph version being used ???

Regards,
Vinayak

On Wed, 28 Apr 2021, 11:23 pm , <suprajratakonda@...> wrote:
I am trying to use configured graph factory. i made changes to gremlin-server.yaml and configuration-management.properties. I am getting the following error.

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a] - type ':remote console' to return to local mode
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> ConfiguredGraphFactory.open("ConfigurationManagementGraph");
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> ConfiguredGraphFactory.create("ConfigurationManagementGraph");
Please create a template Configuration using the ConfigurationManagementGraph#createTemplateConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N


Configured graph factory not working after making changes to gremlin-server.yaml

Sai Supraj R
 

I am trying to use configured graph factory. i made changes to gremlin-server.yaml and configuration-management.properties. I am getting the following error.

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a] - type ':remote console' to return to local mode
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> ConfiguredGraphFactory.open("ConfigurationManagementGraph");
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> ConfiguredGraphFactory.create("ConfigurationManagementGraph");
Please create a template Configuration using the ConfigurationManagementGraph#createTemplateConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N


Re: Mapreduce index repair job fails in Kerberos+SSL enabled cluster

hadoopmarc@...
 

Hi Shiva,

This sound more like a cluster management question than a JanusGraph question, so my suggested steps are:

  1. try to run a simple mapreduce job using HBase, like Hbase's built-in rowcounter example. I assume this will give you the same exception.
  2. hunt down your cluster admins until the example job works
  3. transfer the configs you made for the rowcounter to the janusgraph mapreduce job and hopefully it will work!
Best wishes,     Marc


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

asivieri@...
 

Hi Marc,

yes, the deployMode was specified in the Gremlin Console and not in the properties file, as in the Tinkerpop example, so that's way it was not explicit here.
I am not sure why EMR would be limiting anything, since any different Spark application spawns more executors. But I am still investigating this, I will compare the entire properties list (which is reported in Spark UI as well), maybe there is something different.

For the output folder, yes it is working correctly in a way: I tried executing the CloneVertexProgram and it creates 768 files, all empty... and by zero I mean 0, while any other query (such as valueMap()) returns just nothing.

Best regards,
Alessandro


Re: P.neq() predicate uses wrong ES mapping

hadoopmarc@...
 

https://github.com/JanusGraph/janusgraph/issues/2588

For further explicitness I added the following example:

gremlin> g.V().has('x', neq('lion')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]
==>[id:4192,label:Some,x:watch the dog]


On Sun, Apr 25, 2021 at 09:42 AM, <hadoopmarc@...> wrote:
gremlin> g.V().has('x', neq('watch the dog')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]

gremlin> g.V().has('x', neq('watch')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]
// Here, ==>[id:4192,label:Some,x:watch the dog] is missing, supporting Sergey's issue!!!


Re: P.neq() predicate uses wrong ES mapping

hadoopmarc@...
 

Hi Sergey,

The mere mortals skimming over the questions in this forum often need very explicit examples to fully grasp a point. The transcript below, expanding on the earlier one above, shows the exact consequence of your statement 'problem is that Janusgraph uses tokenised field for "neq" comparisons and non tokenised for "eq". '

According to the ref docs the eq(), neq(), textPrefix(), textRegex() and textFuzzy() predicates should apply to STRING search (so to the non-tokenized field).

gremlin> g.addV('Some').property('x','watch the dog')
==>v[4192]
gremlin> g.tx().commit()
==>null
gremlin> g.V().elementMap()
10:03:40 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[id:4192,label:Some,x:watch the dog]
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]

gremlin> g.V().has('x', eq('watch')).elementMap()
gremlin>
gremlin> g.V().has('x', eq('watch the dog')).elementMap()
==>[id:4192,label:Some,x:watch the dog]

gremlin> g.V().has('x', neq('watch the dog')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]

gremlin> g.V().has('x', neq('watch')).elementMap()
==>[id:4264,label:Some,x:x2,y:??]
==>[id:4224,label:Some,x:x1,y:y1]
// Here, ==>[id:4192,label:Some,x:watch the dog] is missing, supporting Sergey's issue!!!

Related to this, there does not exist a negation for the textContains() predicate for full TEXT search. Using the TextP.notContaining()TinkerPop generic predicate, causes JanusGraph to not use the index.

I will post an issue on github referring to this thread.

Best wishes,   Marc


Re: Union Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

I am not sure how to improve this query further through gremlin.

Query can be faster through data model.  VCI will be helpful, if you are applying any other filter along with hasLabel and  your edge selectivity is low compare  to the total degree of those vertex. 

If this query is very frequent and there's a need to improve it further, you can make inV title property be part of edge and VCI can be enabled on that edge property.

Other than that not sure if any configuration can be done to further improve. Someone else might comment on this front.


Regards,
Amiya


On Thu, 22 Apr 2021, 19:08 Vinayak Bali, <vinayakbali16@...> wrote:
Hi Amiya, 

Thank you for the query. It also increased the performance. But, it's still 35 seconds. Is there any other way to optimize it further, there are only 10 records returned by the query. 
Counts are as follows: 
V1: 187K V2:40 V3: 50 V4: 447K 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 12:55 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

You can try below query, it can use index and combine as many traversals you want.

g.inject(1).
   union ( 
      V().has('title', 'V1'). outE().hasLabel('E1').inV().has('title', 'V2'),
    
       V().has('title', 'V3'). outE().hasLabel('E3').inV().has('title', 'V4'))....

Regards,
Amiya



On Thu, 22 Apr 2021, 10:36 Vinayak Bali, <vinayakbali16@...> wrote:
Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Mapreduce index repair job fails in Kerberos+SSL enabled cluster

shivainfotech12@...
 

Hi All,

I'm trying to run index repair job through mapreduce in a Kerberos+SSL enabled cluster.
I have added all required hbase and hadoop configurations but getting the below exception in mapreduce logs.

2021-04-22 20:19:55,112 DEBUG [hconnection-0x3bbf9027-metaLookup-shared--pool2-t1] org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is sudarshan/inedccpe101.informatica.com@...
2021-04-22 20:19:55,113 DEBUG [hconnection-0x3bbf9027-metaLookup-shared--pool2-t1] org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException as:sudarshan (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2021-04-22 20:19:55,113 DEBUG [hconnection-0x3bbf9027-metaLookup-shared--pool2-t1] org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:sudarshan (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:643)
2021-04-22 20:19:55,113 WARN [hconnection-0x3bbf9027-metaLookup-shared--pool2-t1] org.apache.hadoop.hbase.ipc.RpcClientImpl: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2021-04-22 20:19:55,113 ERROR [hconnection-0x3bbf9027-metaLookup-shared--pool2-t1] org.apache.hadoop.hbase.ipc.RpcClientImpl: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:220)
        at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:909)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1244)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:35396)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner$SmallReversedScannerCallable.call(ClientSmallReversedScanner.java:298)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner$SmallReversedScannerCallable.call(ClientSmallReversedScanner.java:276)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:338)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:137)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:162)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:201)
        ... 25 more
 

Can anyone please help on this issue.

Thanks
Shiva


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

hadoopmarc@...
 

Hi Alessandro,

The executors tab of the spark UI shows the product of spark.executor.instances times spark.executor.cores. I guess spark.executor.instances defaults to one and EMR might limit the number of executor cores?

I also won't hurt to explicitly specify spark.submit.deployMode=client assuming EMR allows it. I am not sure whether Gremlin Console needs client mode to have the count results returned. And with a "zero" result in the Gremlin Console did you mean 0 or just ==>     ?

For having written output on "output", you have to configure the distributed storage, so that "output" is a path on hadoop-hdfs (each executor writes its output to a partition on the distributed storage, so you would have 768 partitions in the output directory). Be aware that TinkerPop uses a bit strange naming in the output directory.

Best wishes,      Marc

Best wishes,     Marc


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

asivieri@...
 

By the way, if you have any properties file or running example of OLAP that you would like to share, I'd be happy to see something working and compare it to what I am trying to do!

Best regards,
Alessandro


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

asivieri@...
 

Hi,

here are the properties that I am setting so far (plus the same ones that are set in the TinkerPop example, such as the classpath for the executors and the driver):
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
 
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
 
schema.default=none
 
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.batch-loading=true
janusgraphmr.ioformat.conf.storage.buffer-size=10000
janusgraphmr.ioformat.conf.storage.cql.keyspace=...
 
janusgraphmr.ioformat.conf.storage.hostname=...
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.username=...
janusgraphmr.ioformat.conf.storage.password=...
cassandra.output.native.port=9042
 
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true
 
spark.master=yarn
spark.executor.memory=20g
spark.executor.cores=4
spark.driver.memory=20g
spark.driver.cores=8
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
gremlin.spark.graphStorageLevel=MEMORY_AND_DISK
gremlin.spark.persistContext=true
gremlin.spark.persistStorageLevel=MEMORY_AND_DISK
spark.default.parallelism=1000
On Spark UI I can see a number of tasks for the first job which is the same number of tokens for our Scylla cluster (256 tokens per node * 3 nodes), but only two executors are spawn, even though I tried on a cluster with 96 cores and 768 GB of RAM, which, given the configuration of drivers and executors that you can see in the properties, should allocate a lot more than 2.

Moreover, I wrote a dedicated Java application that replicates the first step of the SparkGraphComputer, which is the step where the entire vertex list is read into a RDD, so basically I tried skipping the entire Gremlin console, start a "normal" Spark session as we do in our applications, and then read the entire vertex list from Scylla. In this case the job has the same number of tasks as before, but the number of executors is the correct one that I expected, so it seems to me that something in the Spark context creation performed by Gremlin is limiting this number, so maybe I am missing a configuration.
The problem of empty results, however, remained: in this test the RDD in output is completely empty, even though the logs in DEBUG show that it is connecting to the correct keyspace, where there is some data present. There are no exceptions, so I am not sure why we are not reading anything. Am I missing some properties in your opinion/experience?

Best regards,
Alessandro


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

hadoopmarc@...
 

Hi Alessandro,

Yes, please include the properties file.

To be clear, you see in the spark web UI:
spark.master=yarn
spark.executor.instances=12

and only two executors for 700+ tasks show up, while other jobs using the same EMR account spawn tens of executors? Is their any yarn queue you have to specify to get more resources from yarn? It sounds like some limit in yarn RM.

Best wishes,   Marc


Re: Union Query Optimization

Vinayak Bali
 

Hi Amiya, 

Thank you for the query. It also increased the performance. But, it's still 35 seconds. Is there any other way to optimize it further, there are only 10 records returned by the query. 
Counts are as follows: 
V1: 187K V2:40 V3: 50 V4: 447K 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 12:55 PM AMIYA KUMAR SAHOO <amiyakr.sahoo91@...> wrote:
Hi Vinayak,

You can try below query, it can use index and combine as many traversals you want.

g.inject(1).
   union ( 
      V().has('title', 'V1'). outE().hasLabel('E1').inV().has('title', 'V2'),
    
       V().has('title', 'V3'). outE().hasLabel('E3').inV().has('title', 'V4'))....

Regards,
Amiya



On Thu, 22 Apr 2021, 10:36 Vinayak Bali, <vinayakbali16@...> wrote:
Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Re: Union Query Optimization

AMIYA KUMAR SAHOO
 

Hi Vinayak,

You can try below query, it can use index and combine as many traversals you want.

g.inject(1).
   union ( 
      V().has('title', 'V1'). outE().hasLabel('E1').inV().has('title', 'V2'),
    
       V().has('title', 'V3'). outE().hasLabel('E3').inV().has('title', 'V4'))....

Regards,
Amiya



On Thu, 22 Apr 2021, 10:36 Vinayak Bali, <vinayakbali16@...> wrote:
Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Re: Union Query Optimization

Vinayak Bali
 

Hi, cmilowka,

The property title has a composite index created on it. Further modified the query as follows:

g.V().has('title',within('V1','V2')).union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

the only change is adding has('title',within('V1','V2')) at start of query. The warning is gone now and performance is also improved. 
Earlier the time taken was around 3.5 mins now it's 55 sec to return only 44 records.
The problem is my source changes, need to consider it. For example: 
v1 - e1 -v2
v3 -e2 -v4
Want the want in a single query. now the query for this will be as follows:

g.V().has('title',within('V1','V3')).union(has('title','V1').as('v1').outE().has('title','E1').as('e').inV().has('title','V2').as('v2'),has('title','V3').as('v1').outE().has('title','E2').as('e').inV().has('title','V4').as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

Request all of you to provide your feedback to improve it further. 

Thanks & Regards,
Vinayak

On Thu, Apr 22, 2021 at 5:14 AM cmilowka <cmilowka@...> wrote:

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Re: Union Query Optimization

cmilowka
 

I guess, building composite index for 'title' property will do the job of accessing title(V1) and title(V2) fast, without full scan of DB as currently does.

cheers, CM


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

kndoan94@...
 

Hi Alessandro,

I'm also working through a similar use-case with AWS EMR, but I'm running into some Hadoop class errors. What version of EMR are you using?

Additionally, if you could pass along the configuration details in your .properties file, that would be extremely helpful :) 

Thank you!
Ben


Re: Strange behaviors for Janusgraph 0.5.3 on AWS EMR

asivieri@...
 

Hi Marc,

the Tinkerpop example works correctly. We are actually using Scylla, and with 256 tokens per node I am getting 768 tasks in the Spark job (which I correctly see listed in the UI). The problems I have are that a) only 2 executors are spawn, which does not make much sense since I have configured executor cores and memory in the properties file and the cluster has resources for more than 2, and b) no data is being transmitted back from the cluster, even though performing similar (limited) queries without Spark produce results.

Best regards,
Alessandro

821 - 840 of 6665