Date   

Forcing Janusgraph to use indices when performing traversal with Union step

brad@...
 

Hello,

I need to execute Janusgraph traversals, using a Union step, which contains multiple 'has' steps, each of which tests an indexed property for equality to a value.  Prior to the Union step, there is another 'has' step, which also tests an indexec property.

For example:

    g.V().has('indexed-prop1', 'value1').union(has('indexed-prop2', 'value2'), has('indexed-prop3', 'value3'))

How can I verify whether or not the traversal is using indices 'indexed-prop2' and 'indexed-prop3'?  I suspect that it is not.  The traversal seems to take a long time to run; however, when I simplify the traversal, by having only one 'has' step within the union, it runs very quickly.

What is the best way to execute this type of traversal.  My application is written in Java.

Thanks in advance,
Brad


Re: dynamic graphics, limits and global index

Matthew Nguyen
 

Hi Marc, I follow what you're saying but will point out it doesn't have to play out to exabytes. Reason is that IDs are non recyclable and there are losses due to bulk load reservations and deletions. I'm looking at JG from a 3store pov where a trillion triples isn't out of reach when it comes to knowledge graphs. Hearing that limits are scoped to the graph makes it much more palatable. 

thx, matt


Re: Updating our business info on your site

Misha Brukman
 

Hi Rosy,

Thanks for reaching out! Please let us know what needs to be changed and how, either by filing an issue on GitHub or via this email thread if you don't have a GitHub account, and we'll make the relevant changes on the site.

Best,
Misha

On Thu, Feb 3, 2022 at 1:09 PM Rosy Hunt <rosy.hunt@...> wrote:
Hello

Can you let me know who I need to speak to about updating our info on your resource pages? We have two shiny new products which are worth mentioning, and your site currently links to a somewhat outdated post :)

Many thanks in advance for your help.

Rosy

--
Rosy Hunt
Content Marketing Specialist
Cambridge Intelligence

 

Cambridge Intelligence Limited which is registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61. Registered Office 6-8 Hills Road, Cambridge, CB2 1JP, UK.


Updating our business info on your site

Rosy Hunt <rosy.hunt@...>
 

Hello

Can you let me know who I need to speak to about updating our info on your resource pages? We have two shiny new products which are worth mentioning, and your site currently links to a somewhat outdated post :)

Many thanks in advance for your help.

Rosy

--
Rosy Hunt
Content Marketing Specialist
Cambridge Intelligence

 

Cambridge Intelligence Limited which is registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61. Registered Office 6-8 Hills Road, Cambridge, CB2 1JP, UK.


separate elastic search for separate graph

51kumarakhil@...
 

Hi! I've setup elastic search for configurationGraphFactory and Bigtable, using below configurations

Configurations:
storage.lock.wait-time=100
storage.hbase.ext.google.bigtable.instance.id=<bigtable-id>
index.search.hostname=<host-name>
index.search.index-name=janusgraph_metadata
index.search.port=9243
index.search.elasticsearch.ssl.enabled=true
index.search.elasticsearch.http.auth.basic.password=<password>
index.search.elasticsearch.http.auth.type=basic
index.search.elasticsearch.http.auth.basic.username=<username>
storage.backend=hbase
storage.hostname=localhost
schema.default=none
storage.batch-loading=true
storage.hbase.ext.google.bigtable.project.id=<project-id>
graph.timestamps=MICRO
index.search.elasticsearch.connect-timeout=10000000
index.search.backend=elasticsearch
storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
storage.hbase.keyspace=jgex

------------------------------------------------------------------------
--------------------------------------------------------------------------------------

Create Graph
ConfiguredGraphFactory.create('graph_01')


Adding MixedIndex on property 'name'
mgmt = graph_01.openManagement();
mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.single).make();
name = mgmt.getPropertyKey("nome");
mgmt.buildIndex('byNomeUniqueMixed', Vertex.class).addKey(name, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search");
mgmt.commit();


Adding Data
graph_01_traversal.addV('person').property('name', 'Tom Cruise')


Fetching Data
graph_01_traversal.V().has("name", textPrefix("Tom"))

output: v[6753]

Query #1:  Is this the correct way to setup elastic search?


------------------------------------------------------------------------
--------------------------------------------------------------------------------------


Using above approach only, if I create a new graph
graph_02 (,say). With same configurations and mixedIndex. 
And if I add a data 

graph_02_traversal.addV('person').property('name', 'Tom Holand')

And if  I try to fetch this it

graph_02_traversal.V().has("name", textPrefix("Tom Holand"))

output:
v[6753]
v[9785]


here, I'm getting data from graph_01 as well despite using graph_02_traversal in the 'has' query

Question #2: Is there a way to setup a separate ES for a graph









JanusGraph database cache on distributed setup

washerath@...
 

In a multi node Janusgraph cluster, data modification done from one instance does not sync with others until it reaches the given expiry time (cache.db-cache-time)

As per the documentation[1] it does not recommends to enable database level cache in a distributed setup as cached data does not share amoung instances.

Any suggestions for a solution/workaround where i can see the data changes from other JG instances immediately and avoid stale data access ?


[1] https://docs.janusgraph.org/operations/cache/#cache-expiration-time


Re: dynamic graphics, limits and global index

hadoopmarc@...
 

Hi Matt,

Correct, but you should really try this out and see for yourself. Also check the janusgraph db folder after having created two graphs and see what files are created.

Per graph, but you realize that these number would require exabytes of storage?

Best wishes,   Marc


Re: hasNext() slow for large number of incoming edges

Boxuan Li
 

Created https://github.com/JanusGraph/janusgraph/issues/2966 to track the streaming feature request.


Re: Exception while creating vertex with custom vertex id

Umesh Gade
 

Thanks Marc for the pointers to check further. 
The Issue was not reproduced when we brought the setup again to 3 node cluster configuration. I will keep watching and collect more details if this issue hits again.

On Mon, Jan 31, 2022 at 2:04 PM <hadoopmarc@...> wrote:
Hi Umesh,

No, it is not clear at all whether the issue and cassandra-4.0 are related, but generally it is not useful to report bugs regarding unsupported configurations. To dig deeper, I would be curious about:
 - can you log the vertex id and label for the transactions that fail and  can you make another transaction fail with these values?
 - the stracktrace suggests that the issue is in querying the graph schema. Can you trigger the exception by manually querying the graph schema for the vertex labels involved?

Best wishes,     Marc



--
Sincerely,
Umesh Gade


Re: Exception while creating vertex with custom vertex id

hadoopmarc@...
 

Hi Umesh,

No, it is not clear at all whether the issue and cassandra-4.0 are related, but generally it is not useful to report bugs regarding unsupported configurations. To dig deeper, I would be curious about:
 - can you log the vertex id and label for the transactions that fail and  can you make another transaction fail with these values?
 - the stracktrace suggests that the issue is in querying the graph schema. Can you trigger the exception by manually querying the graph schema for the vertex labels involved?

Best wishes,     Marc


Re: Exception while creating vertex with custom vertex id

Umesh Gade
 

Hi Marc,
Yes, we are aware of the compatibility matrix. We are keeping an eye on issues with JG-0.6.0 + Cassandra-4.0. This combination has been running for the last 3-4 months without any issues for our use cases. 
Is the above issue surely to be a compatibility issue ? 

On Sun, Jan 30, 2022 at 3:00 PM <hadoopmarc@...> wrote:
Hi Umesh,

On the first line of your first post you state that you use cassandra-4.0. However, support for cassandra-4.0 is still an open issue:

https://github.com/JanusGraph/janusgraph/issues/2325
https://docs.janusgraph.org/changelog/#version-compatibility-matrix

It is confusing, indeed, that the driver version is 4.13.0, but this is correct, see:
https://github.com/JanusGraph/janusgraph/blob/v0.6.1/pom.xml

Best wishes,    Marc

On Sat, Jan 29, 2022 at 06:07 PM, Umesh Gade wrote:
set-vertex-id



--
Sincerely,
Umesh Gade


Re: Exception while creating vertex with custom vertex id

hadoopmarc@...
 

Hi Umesh,

On the first line of your first post you state that you use cassandra-4.0. However, support for cassandra-4.0 is still an open issue:

https://github.com/JanusGraph/janusgraph/issues/2325
https://docs.janusgraph.org/changelog/#version-compatibility-matrix

It is confusing, indeed, that the driver version is 4.13.0, but this is correct, see:
https://github.com/JanusGraph/janusgraph/blob/v0.6.1/pom.xml

Best wishes,    Marc


On Sat, Jan 29, 2022 at 06:07 PM, Umesh Gade wrote:
set-vertex-id


Re: Exception while creating vertex with custom vertex id

Umesh Gade
 

Hi Marc,
Graph config is as below
"configuration" : {
      "attributes.custom.attribute1.attribute-class" : "java.util.HashMap",
      "attributes.custom.attribute1.serializer-class" : "com.vrts.itrp.graph.serializers.HashMapSerializer",
      "attributes.custom.attribute2.attribute-class" : "java.util.HashSet",
      "attributes.custom.attribute2.serializer-class" : "com.vrts.itrp.graph.serializers.HashSetSerializer",
      "attributes.custom.attribute3.attribute-class" : "java.util.ArrayList",
      "attributes.custom.attribute3.serializer-class" : "com.vrts.itrp.graph.serializers.ArrayListSerializer",
      "attributes.custom.attribute4.attribute-class" : "java.util.LinkedHashMap",
      "attributes.custom.attribute4.serializer-class" : "com.vrts.itrp.graph.serializers.LinkedHashMapSerializer",
      "attributes.custom.attribute5.attribute-class" : "java.util.LinkedList",
      "attributes.custom.attribute5.serializer-class" : "com.vrts.itrp.graph.serializers.LinkedListSerializer",
      "attributes.custom.attribute6.attribute-class" : "java.util.LinkedHashSet",
      "attributes.custom.attribute6.serializer-class" : "com.vrts.itrp.graph.serializers.LinkedHashSetSerializer",
      "query.optimizer-backend-access" : false,
      "storage.cql.executor-service.enabled" : false,
      "storage.backend" : "cql",
      "storage.cql.read-consistency-level" : "LOCAL_QUORUM",
      "storage.cql.write-consistency-level" : "LOCAL_QUORUM",
      "storage.read-only" : false,
      "storage.cql.only-use-local-consistency-for-system-operations" : false,
      "log.tx.key-consistent" : false,
      "storage.hostname" : "localhost",
      "storage.port" : 9042,
      "storage.cql.keyspace" : "xxx_ks",
      "storage.cql.local-datacenter" : "dc1",
      "storage.cql.atomic-batch-mutate" : false,
      "storage.lock.retries" : 20,
      "query.fast-property" : false,
      "query.force-index" : true,
      "graph.set-vertex-id" : true
    }


Code snippet which creates vertex labels:
graph.tx().rollback();
JanusGraphManagement janusMgmt = graph.openManagement();

labels.forEach(l -> {
VertexLabel vl = janusMgmt.getVertexLabel(l);
if(vl==null) {
logger.info("Creating vertex label " + l);
janusMgmt.makeVertexLabel(l).make();
}
else {
logger.info("Vertex label " + l + " already exists");
}
});


Further update on issue:
- We have continuous retries in our code on exception. This issue was gone when we brought down 3 node cluster to 1 node cluster.
- I see below warning logs around the same time.
2022-01-13 15:18:17,042+05:30 WARN com.datastax.oss.driver.internal.core.pool.ChannelPool [JanusGraph Session-admin-7] - [JanusGraph Session|/10.221.187.3:9042]  Error while opening new channel (ConnectionInitException: [JanusGraph Session|connecting...] Protocol initialization request, step 1 (STARTUP {CQL_VERSION=3.0.0, DRIVER_NAME=DataStax Java driver for Apache Cassandra(R), DRIVER_VERSION=4.13.0, CLIENT_ID=73988c72-3e8e-426b-b3cd-f6d7893a9f04}): failed to send request (io.netty.channel.StacklessClosedChannelException))

2022-01-13 15:18:31,174+05:30 WARN com.datastax.oss.driver.internal.core.pool.ChannelPool [JanusGraph Session-admin-7] - [JanusGraph Session|/10.221.187.3:9042]  Error while opening new channel (ConnectionInitException: [JanusGraph Session|connecting...] Protocol initialization request, step 1 (STARTUP {CQL_VERSION=3.0.0, DRIVER_NAME=DataStax Java driver for Apache Cassandra(R), DRIVER_VERSION=4.13.0, CLIENT_ID=cb5b14df-85bf-4b90-b48b-be8fdac2404e}): failed to send request (java.nio.channels.NotYetConnectedException))




On Sat, Jan 29, 2022 at 3:51 PM <hadoopmarc@...> wrote:
Hi Umesh,

Can you please add the graph properties (configs) and the statements for creating the vertex labels in the graph schema (if any)?

Cheers,    Marc



--
Sincerely,
Umesh Gade


Re: dynamic graphics, limits and global index

Matthew Nguyen
 

Hi Marc, just to be clear.

If I create Graph A and build an index on Vertex P via something like:

JanusGraphManagement mgmt = graph.openManagement();
PropertyKey propP = mgmt.getPropertyKey("propertyP");  // assume 'propertyP' has been created via makeProperty()
mgmt.buildIndex("byP", Vertex.class).addKey(propP).buildCompositeIndex();

and I create another Graph B and build the same index and I insert "Hello World" via

GraphA.V().addV().property("propertyP", "Hello World") and 
GraphB.V().addV().property("propertyP", "Hello World")

There should be no conflict btwn the two graphs on propertyP correct?  I assume not but something I want to be absolutely clear on.  Also, are the limits on number of vertices (2^59?) and edges (2^60) per graph or per storage installation?

thx, matt


Re: Exception while creating vertex with custom vertex id

hadoopmarc@...
 

Hi Umesh,

Can you please add the graph properties (configs) and the statements for creating the vertex labels in the graph schema (if any)?

Cheers,    Marc


Re: dynamic graphics, limits and global index

hadoopmarc@...
 

Hi Matthew,

I suppose you mean identifiers instead of indices? The identifier space is per graph. The storage backends make separate files per graph.

Marc


Re: hasNext() slow for large number of incoming edges

Boxuan Li
 

Hi Matt,

No worries, let me create an issue. You are right, g.E().hasNext() is fast, and that’s because the results are streamed. On the other hand, g.V().has(“id”, “v0”).outE().hasNext() is slow if vertex v0 has a huge amount of incident edges, and that’s because the results, in this case, are not streamed. It definitely needs some investigation, but usually it’s not a big problem because people don’t expect a large number of incident edges attached to a node.

Good luck with your work! If it’s not JanusGraph specific, you might also want to join the TinkerPop Discord server (https://discord.gg/ndMpKZcBEE) to interact with the wider graph community.

Best,
Boxuan

On Jan 28, 2022, at 9:24 PM, Matthew Nguyen via lists.lfaidata.foundation <nguyenm9=aol.com@...> wrote:

Hi Boxuan,

Happy to put in a request on github but still a little confused. Are we saying g.E().has('index_key', 'large_number_of_edges').hasNext() isn't streaming but should (note: g.E().hasNext() is fast) ?  Also, I think to close the gap on RDF/Property Graph, we do need to see what can be done about allowing for natural modeling in RDF which is really to make liberal use of edges. The problem with properties and RDF is that RDF expects you to index virtually everything in order for the queries to be quick.  Not sure how we can model non-generic properties in that capacity.

BTW I'm using Joshua's Graphsail (https://github.com/joshsh/graphsail) implementation to see if I can get it to work and trying to work through some of the edge (no pun intended) cases.

thx, matt



Re: hasNext() slow for large number of incoming edges

Matthew Nguyen
 

Hi Boxuan,

Happy to put in a request on github but still a little confused. Are we saying g.E().has('index_key', 'large_number_of_edges').hasNext() isn't streaming but should (note: g.E().hasNext() is fast) ?  Also, I think to close the gap on RDF/Property Graph, we do need to see what can be done about allowing for natural modeling in RDF which is really to make liberal use of edges. The problem with properties and RDF is that RDF expects you to index virtually everything in order for the queries to be quick.  Not sure how we can model non-generic properties in that capacity.

BTW I'm using Joshua's Graphsail (https://github.com/joshsh/graphsail) implementation to see if I can get it to work and trying to work through some of the edge (no pun intended) cases.

thx, matt


Exception while creating vertex with custom vertex id

Umesh Gade
 

Hi all,
We faced below exception while creating vertex with custom vertex id. Issue did not reproduce steadily. Setup configuration is JG-0.6.0 + cassadra-4.0 in 3 node-single DC cluster
java.lang.NullPointerException: null
        at org.janusgraph.graphdb.types.VertexLabelVertex.isPartitioned(VertexLabelVertex.java:41) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.types.VertexLabelVertex.hasDefaultConfiguration(VertexLabelVertex.java:67) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:579) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:127) ~[janusgraph-core-0.6.0.jar:?]
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:143) ~[janusgraph-core-0.6.0.jar:?]

Any pointer to debug further ?



Re: hasNext() slow for large number of incoming edges

Boxuan Li
 

Hi Matt,

It will definitely be a valid and valuable feature, if we could expose the streaming capacity to end users. If I recall correctly, the low-level results are indeed streamed (it might vary depending on the storage backend), but the interface is not exposed to the upper level APIs. Do you want to create a feature request on GitHub? Otherwise I can do it later.

Regarding your particular usecase, you said you had triples like <microsoft> <rdfs:type> <company>, which you modelled as V('microsoft') -> E('rdfs:type') -> V('company'). I suggest you model “type” as a property rather an edge. So, you will not create a vertex called “company”. Rather, you create a vertex called “microsoft” with a property “type” whose value is “company”, e.g.

g.addV().property(“value”, “microsoft”).property(“type”, “company”)

Rule of thumb: when you anticipate a super node, consider modeling it as a property rather than a vertex. Edges should be used to describe “relationships between nodes” rather than “properties attached to nodes”. This is the difference between a RDF and a property graph.

Best,
Boxuan

On Thu, Jan 27, 2022 at 12:51 AM Matthew Nguyen via lists.lfaidata.foundation <nguyenm9=aol.com@...> wrote:

Hi Boxuan, thanks for the response. Some background:  I'm trying to use JG as a triplestore and importing rdf.

The triple <microsoft> <rdfs:type> <company> can be modelled as V('microsoft') -> E('rdfs:type') -> V('company') such that:

g.V().has('value', 'microsoft').out().has('value', 'company').inE('rdfs:type').hasNext() = true

Certainly there can be millions of companies out there that can be modelled similarly.  I u/d the issue surround supernodes,  so perhaps this question is more about trying to u/d some internals of JG.

Note:  again, my use case is not exactly like above where everything is know but more around the sparql query:  select ?company where { ?comp rdfs:type <company> } or give me all companies of rdfs:type company which translates to Gremlin:
   g.V('value','company').inE() and then traverse inE().  But  g.V('value','company').inE().hasNext() takes a long time to initially run.

1) what is g.V(v).inE(e).hasNext() doing above that a call on a supernode is taking so long?  if it's trying to load all incidental edges, should either the documentation be updated or maybe the function be renamed to reflect potential latency issues?  or maybe the implementation is broken up something like c++ iteration -> traversal.begin(); while (traversal.hasNext()) traversal.next()... or something like that.  begin() and hasNext() can be implemented via the range(..) function you mentioned to better control perceived latency.  

2) When you mention remodelling, I can think of 2 ways to do so off the top of my head (please advise on others).
a. Have multiple types of Companies (TechCompany, FinancialCompany, etc.) to reduce the likelihood of a supernode
b. Add a property to V('microsoft').has('rdfs:type', 'company').  If I do this, and assuming 'rdfs:type' is property indexed, will V().has('rdf:type', 'company').hasNext() be fast?  If so, why?  

I hope this doesn't come across negatively.  I am very interested in trying to bridge the gap btwn LPG & RDF (3store) and I think I have some good use cases that can hopefully help to improve JG down the road.

thx, matt

 

 

161 - 180 of 6554