Date   

Re: JanusGraph 0.6.0 traversal change?

Clement de Groc
 

Yes. In JanusGraph 0.6.0, Apache TinkerPop was upgraded to 3.5.1 and requires using an anonymous traversal in such cases.
This is mentioned in their "Upgrade for users" guide here: https://tinkerpop.apache.org/docs/current/upgrade/#_anonymous_child_traversals


Re: JanusGraph 0.6.0 traversal change?

criminosis@...
 

Ahh I see, I guess this is the intended usage now?

gremlin> g.addV()
==>v[16496]
gremlin> g.addV()
==>v[4096]
gremlin> g.V(16496).addE('user_alias').to(__.V(4096))
==>e[4cu-cq8-xzp-35s][16496-user_alias->4096]
gremlin>
Basically doing "__.V" instead of "g.V" for the child traversal?


JanusGraph 0.6.0 traversal change?

criminosis@...
 

When running with 0.5.2 I was able to do this traversal to add an edge between to vertices

gremlin> g.addV()
==>v[8200]
gremlin> g.addV()
==>v[4336]
gremlin> g.V(8200).addE('my_edge_label').to(g.V(4336))

But when doing it through 0.6.0 I get this now:

gremlin> g.V(8200).addE('my_edge_label').to(g.V(4336))
The child traversal of [GraphStep(vertex,[4336])] was not spawned anonymously - use the __ class rather than a TraversalSource to construct the child traversal
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.IllegalStateException: The child traversal of [GraphStep(vertex,[4336])] was not spawned anonymously - use the __ class rather than a TraversalSource to construct the child traversal
at org.apache.tinkerpop.gremlin.process.traversal.Bytecode.convertArgument(Bytecode.java:302)
at org.apache.tinkerpop.gremlin.process.traversal.Bytecode.flattenArguments(Bytecode.java:287)
at org.apache.tinkerpop.gremlin.process.traversal.Bytecode.addStep(Bytecode.java:94)
at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.to(GraphTraversal.java:1145)
at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal$to$4.call(Unknown Source)
at Script148.run(Script148.groovy:1)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:676)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:378)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:272)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
gremlin>

After fiddling with it though I noticed I was able to do this:

gremlin> g.V(4336).as('test').V(8200).addE('my_edge_label').to('test')
==>e[2dd-6bs-xzp-3cg][8200-my_edge_label->4336]

However just doing the vertex id is not permitted, which makes sense given it's just an integer with no context.

gremlin> g.V(8200).addE('my_edge_label').to(4336)
No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.to() is applicable for argument types: (Integer) values: [4336]
Possible solutions: is(java.lang.Object), take(int), tap(groovy.lang.Closure), by(groovy.lang.Closure), drop(int), any()
I've been looking through the 0.6.0 milestone and found this issue, but that seemed more about a documentation change in 0.6.0 than a code change. Environment wise I'm just running these in a gremlin session within a docker-compose environment with Cassandra and Elasticsearch as backends.

Just wondering if the change here is intentional? Seemed weird that it was suggesting to the use "__" class too.


Re: Janusgraph embedded multi instance(JVM) data sync issue

hadoopmarc@...
 

Hi Pawan,

Interesting, I could not find a JanusGraph unit test for this basic scenario (there is one with two instances and an index, though). This needs more investigation.

Meawhile, are you sure that you have no hidden configs for caching in the springframework rest service?

Best wishes,    Marc


Re: Indexing Strategies for RDF edges/predicates on Janusgraph

Matthew Nguyen <nguyenm9@...>
 

Thanks for the clarification.  That makes sense.  I suppose if there were multiple edges from V(h) to V('mother') I would need to qualify it to insure the path exists?

eg V(h).out('mother').inE('isSon') vs V(h).out('mother').inE('battled') 



-----Original Message-----
From: AMIYA KUMAR SAHOO <amiyakr.sahoo91@...>
To: janusgraph-users@...
Sent: Mon, Jan 24, 2022 2:26 pm
Subject: Re: [janusgraph-users] Indexing Strategies for RDF edges/predicates on Janusgraph

Hi Mathew,

Both of the example shows 2 different types of default index.

g.V(h).out('mother')
- This is example for default vertex-centric indexes per edge label 
- This will help to traverse specific type of edge among different types of edge quickly.
- in your case to find all employees employedBy a company will use this.

g.V(h).values('age')
- This is example for default vertex-centric indexes per property key.
- This will help to get the value of a single property among several properties of a single vertex


Now there can be a situation you can have 1k types of edges associated to a vertex (one company). Except emploedBy edge, other edges have less cardinality(let's say < 10). But 2k employees  employedBy by that company. You want to find if company has a employee with name John. In this case if your your travesal starts from company and goes with employedBy edge, it has to traverse all 2k edges to find out whether John is an employee or not. This situation can be made faster if employee name is available on edge and there is a VCI enabled on it.

This might not be a very good example as it can be optimised in different ways
1) if employee have less degree for employedBy edge, you can start traversal from employee vertex. 

Hope it helps,
Amiya

On Tue, 25 Jan 2022, 00:01 Matthew Nguyen via lists.lfaidata.foundation, <nguyenm9=aol.com@...> wrote:
Hi Amiya, I saw that but wasn't quite sure the intent given the example.  It talks about edge labels but the examples are vertices & values?
g.V(h).out('mother') -> returns a vertex traversal?
g.V(h).values('age') -> returns a Value?
Also, what do you mean by 'But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.'?  
thx, matt


Re: Indexing Strategies for RDF edges/predicates on Janusgraph

AMIYA KUMAR SAHOO
 

Hi Mathew,

Both of the example shows 2 different types of default index.

g.V(h).out('mother')
- This is example for default vertex-centric indexes per edge label 
- This will help to traverse specific type of edge among different types of edge quickly.
- in your case to find all employees employedBy a company will use this.

g.V(h).values('age')
- This is example for default vertex-centric indexes per property key.
- This will help to get the value of a single property among several properties of a single vertex


Now there can be a situation you can have 1k types of edges associated to a vertex (one company). Except emploedBy edge, other edges have less cardinality(let's say < 10). But 2k employees  employedBy by that company. You want to find if company has a employee with name John. In this case if your your travesal starts from company and goes with employedBy edge, it has to traverse all 2k edges to find out whether John is an employee or not. This situation can be made faster if employee name is available on edge and there is a VCI enabled on it.

This might not be a very good example as it can be optimised in different ways
1) if employee have less degree for employedBy edge, you can start traversal from employee vertex. 

Hope it helps,
Amiya


On Tue, 25 Jan 2022, 00:01 Matthew Nguyen via lists.lfaidata.foundation, <nguyenm9=aol.com@...> wrote:

Hi Amiya, I saw that but wasn't quite sure the intent given the example.  It talks about edge labels but the examples are vertices & values?

g.V(h).out('mother') -> returns a vertex traversal?
g.V(h).values('age') -> returns a Value?

Also, what do you mean by 'But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.'?  

thx, matt


Re: Indexing Strategies for RDF edges/predicates on Janusgraph

Matthew Nguyen <nguyenm9@...>
 

Hi Amiya, I saw that but wasn't quite sure the intent given the example.  It talks about edge labels but the examples are vertices & values?

g.V(h).out('mother') -> returns a vertex traversal?
g.V(h).values('age') -> returns a Value?

Also, what do you mean by 'But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.'?  

thx, matt


Re: Indexing Strategies for RDF edges/predicates on Janusgraph

AMIYA KUMAR SAHOO
 

Hi Mathew,


As per the below Note from Janusgraph docs, even if company is having 1k different types of edge related to it, traverse by edge lable will be fast.

Such as find employees employedBy (edge lable) company. 

But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.

JanusGraph automatically builds vertex-centric indexes per edge label and property key. That means, even with thousands of incident battled edges, queries like g.V(h).out('mother') or g.V(h).values('age') are efficiently answered by the local index.


Thanks,
Amiya


On Mon, 24 Jan 2022, 12:32 , <hadoopmarc@...> wrote:
Hi Matthew,

It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use.

Cheers,     Marc


Re: Indexing Strategies for RDF edges/predicates on Janusgraph

hadoopmarc@...
 

Hi Matthew,

It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use.

Cheers,     Marc


Indexing Strategies for RDF edges/predicates on Janusgraph

Matthew Nguyen <nguyenm9@...>
 

Hi, I am trying to build a triplestore ontop of JG.  The general model is:

Vertex (subject or object) Properties:
  Label
  Value (IRI, Literal) - indexed

Edge (predicate) Properties:
  Label (predicate)
  hash - effectively a unique hash of predicate so I can globally index it

So effectively we can have Vertex(subject) -> Edge (predicate) -> Vertex(object)

Let's assume I insert the following triples into this model

<matt> <employedBy> <some_company>
<jane> <employedBy> <some_company>
<product1> <isSoldBy> <some_company>
<some_offce> <isLeasedBy> <some_company>
etc

let's say there's literally a 1k different predicates that can be associated with <some_company> and things like <employedBy> can have high cardinality if the company is large.  What's a good way to index these edges/predicates so I can quickly query for all a particular type of edge/predicate on <some_company> (eg 'give me all the ?people <employedBy> <some_company>')

I'm aware of the vertex-centric indexes on edges but it appears I would need to build an index for each of the possible edge labels of <some_company> if I understand the docs correctly (https://docs.janusgraph.org/schema/index-management/index-performance/#edge-indexes).  Please correct me if I'm wrong.  If not, is there another strategy I can use?

thx, matt


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi Marc,

Thanks for your suggestion,

However I am testing it on a local environment having a single replication factor. I believe if the replication factor is one then in all cases it should give me the same data/information in other instances as well. 

Screenshot 2022-01-23 at 5.11.26 PM.png

see below local property file information

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=127.0.0.1
storage.cql.keyspace=janusgraph
storage.port=9042
schema.constraints=true
############ CQL Properties ############

storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_QUORUM
storage.cql.replication-factor=1

Please see attached API code in for create update and get for local sample application. Let me know if something is wrong here because that refresh of data is not working on another embedded instance with the same configuration.

Thanks,
Pawan
 

On Thu, Jan 20, 2022 at 12:44 PM <hadoopmarc@...> wrote:
Hi Pawan,

You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself.
See the third question from Boxuan Li above, in particular:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html

Best wishes,   Marc



--
Thanks & Regard

PAWAN SHRIWAS


[ANNOUNCE] JanusGraph 0.6.1 Release

Oleksandr Porunov
 

The JanusGraph Technical Steering Committee is excited to announce the release of JanusGraph 0.6.1.

JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors.

The release artifacts can be found at this location:
    https://github.com/JanusGraph/janusgraph/releases/tag/v0.6.1

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.6.1/janusgraph-full-0.6.1.zip
 
A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.6.1/janusgraph-0.6.1.zip

The online docs can be found here:
    https://docs.janusgraph.org
 
To view the resolved issues and commits check the milestone here:
    https://github.com/JanusGraph/janusgraph/milestone/22?closed=1

Thank you very much,
Oleksandr Porunov


Re: High HBase backend 'configuration' row contention

hadoopmarc@...
 

Hi Tendai,

"Not serializable" sounds as if you pass a JanusGraph instance from the Spark driver to the executor. The function that runs on the Spark executor should call some static function on the singleton object that holds the JanusGraph instance. If the singleton object is called for the first time, locally on each Spark executor, it creates the JanusGraph instance and its static convenience method returns a GraphTraversalSource g. If the executor function runs a second time (on the next partition of your RDD or DataFrame as input) it again calls the convenience function on the singleton object, but now gets a GraphTraversalSource returned from the existing JanusGraph instance.

Best wishes,     Marc


Re: High HBase backend 'configuration' row contention

Tendai Munetsi
 

Hi Marc,

Thanks for the feedback and suggestion. We investigated applying the JanusGraphFactory inside a singleton object as you've suggested, but ran into the issue that the JanusGraphFactory is not serializable as required for Spark singletons. Do you have any ideas of how to get around this issue?

Thanks,
Tendai


Re: Janusgraph embedded multi instance(JVM) data sync issue

hadoopmarc@...
 

Hi Pawan,

You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself.
See the third question from Boxuan Li above, in particular:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html

Best wishes,   Marc


Re: JG Schema - addConnection seem to create duplicate connections

Peter Molnar
 

Hi Marc,

Thanks a lot for looking into this. As requested, I filled an issue about this on Github: https://github.com/JanusGraph/janusgraph/issues/2950

Thanks,
Peter


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi Marc,

I don't think data cache was created due to elastic search/mixed index only. I have seen this on basic property/node without index as well. I am thinking let's work on basic node/property then we can plan for mixed index cases.

Any suggestions for  basic case without an index backend?

Thanks,
Pawan


On Sat, Jan 15, 2022 at 5:16 PM <hadoopmarc@...> wrote:
Hi Pawan,

OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex.

Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in:
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html

One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java

So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh?

Best wishes,    Marc



--
Thanks & Regard

PAWAN SHRIWAS


Re: JG Schema - addConnection seem to create duplicate connections

hadoopmarc@...
 

Hi Peter,

Thanks for reporting. I think it is a bug. I checked with the standalone gremlin REPL of janusgraph-0.6.0, using:
graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties')

This gives the same results and if you add the from toEdge connections first, the FromEdge gets 4 connections.

You can check that two of the four connections are redundant, that is, they refer to the same edge in the schema:

gremlin> edges[1].mappedConnections()
==>org.janusgraph.core.Connection@1fecfaea
==>org.janusgraph.core.Connection@4872669f
==>org.janusgraph.core.Connection@483f286e
==>org.janusgraph.core.Connection@4bb147ec
gremlin> edges[1].mappedConnections()[0].getConnectionEdge()
==>e[hs0-el-1th-st][525-~T$SchemaRelated->1037]
gremlin> edges[1].mappedConnections()[1].getConnectionEdge()
==>e[ikg-el-1th-171][525-~T$SchemaRelated->1549]
gremlin> edges[1].mappedConnections()[2].getConnectionEdge()
==>e[hs0-el-1th-st][525-~T$SchemaRelated->1037]
gremlin> edges[1].mappedConnections()[3].getConnectionEdge()
==>e[ikg-el-1th-171][525-~T$SchemaRelated->1549]

Finally, I checked that the schema results remain the same if you add the following config properties to the graph (as suggested by the ref docs):
schema.default=none
schema.constraints=true

Can you please report this as an issue on: https://github.com/JanusGraph/janusgraph/issues

Best wishes,   Marc


On Tue, Jan 11, 2022 at 01:06 PM, Peter Molnar wrote:
mgmt = graph.openManagement();


Re: Fastest way to check if a property key is mixed indexed or not

hadoopmarc@...
 

Hi Harshit,

The performance impact for JanusGraph when including a property key in multiple mixed indices, is negligable (the selection of the index for a specific query will be a tat slower). Additional mixed indices imply a heavier load on the indexing backend (in particular memory and storage, CPU during inserts) but with little impact on response times if the cluster is dimensioned properly.

Marc


Re: Janusgraph embedded multi instance(JVM) data sync issue

hadoopmarc@...
 

Hi Pawan,

OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex.

Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in:
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html

One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java

So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh?

Best wishes,    Marc

301 - 320 of 6663