Date   

Re: Indexing Strategies for RDF edges/predicates on Janusgraph

AMIYA KUMAR SAHOO
 

Hi Mathew,


As per the below Note from Janusgraph docs, even if company is having 1k different types of edge related to it, traverse by edge lable will be fast.

Such as find employees employedBy (edge lable) company. 

But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.

JanusGraph automatically builds vertex-centric indexes per edge label and property key. That means, even with thousands of incident battled edges, queries like g.V(h).out('mother') or g.V(h).values('age') are efficiently answered by the local index.


Thanks,
Amiya


On Mon, 24 Jan 2022, 12:32 , <hadoopmarc@...> wrote:
Hi Matthew,

It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use.

Cheers,     Marc


Re: Indexing Strategies for RDF edges/predicates on Janusgraph

hadoopmarc@...
 

Hi Matthew,

It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use.

Cheers,     Marc


Indexing Strategies for RDF edges/predicates on Janusgraph

Matthew Nguyen <nguyenm9@...>
 

Hi, I am trying to build a triplestore ontop of JG.  The general model is:

Vertex (subject or object) Properties:
  Label
  Value (IRI, Literal) - indexed

Edge (predicate) Properties:
  Label (predicate)
  hash - effectively a unique hash of predicate so I can globally index it

So effectively we can have Vertex(subject) -> Edge (predicate) -> Vertex(object)

Let's assume I insert the following triples into this model

<matt> <employedBy> <some_company>
<jane> <employedBy> <some_company>
<product1> <isSoldBy> <some_company>
<some_offce> <isLeasedBy> <some_company>
etc

let's say there's literally a 1k different predicates that can be associated with <some_company> and things like <employedBy> can have high cardinality if the company is large.  What's a good way to index these edges/predicates so I can quickly query for all a particular type of edge/predicate on <some_company> (eg 'give me all the ?people <employedBy> <some_company>')

I'm aware of the vertex-centric indexes on edges but it appears I would need to build an index for each of the possible edge labels of <some_company> if I understand the docs correctly (https://docs.janusgraph.org/schema/index-management/index-performance/#edge-indexes).  Please correct me if I'm wrong.  If not, is there another strategy I can use?

thx, matt


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi Marc,

Thanks for your suggestion,

However I am testing it on a local environment having a single replication factor. I believe if the replication factor is one then in all cases it should give me the same data/information in other instances as well. 

Screenshot 2022-01-23 at 5.11.26 PM.png

see below local property file information

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=127.0.0.1
storage.cql.keyspace=janusgraph
storage.port=9042
schema.constraints=true
############ CQL Properties ############

storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_QUORUM
storage.cql.replication-factor=1

Please see attached API code in for create update and get for local sample application. Let me know if something is wrong here because that refresh of data is not working on another embedded instance with the same configuration.

Thanks,
Pawan
 

On Thu, Jan 20, 2022 at 12:44 PM <hadoopmarc@...> wrote:
Hi Pawan,

You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself.
See the third question from Boxuan Li above, in particular:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html

Best wishes,   Marc



--
Thanks & Regard

PAWAN SHRIWAS


[ANNOUNCE] JanusGraph 0.6.1 Release

Oleksandr Porunov
 

The JanusGraph Technical Steering Committee is excited to announce the release of JanusGraph 0.6.1.

JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors.

The release artifacts can be found at this location:
    https://github.com/JanusGraph/janusgraph/releases/tag/v0.6.1

A full binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.6.1/janusgraph-full-0.6.1.zip
 
A truncated binary distribution is provided:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.6.1/janusgraph-0.6.1.zip

The online docs can be found here:
    https://docs.janusgraph.org
 
To view the resolved issues and commits check the milestone here:
    https://github.com/JanusGraph/janusgraph/milestone/22?closed=1

Thank you very much,
Oleksandr Porunov


Re: High HBase backend 'configuration' row contention

hadoopmarc@...
 

Hi Tendai,

"Not serializable" sounds as if you pass a JanusGraph instance from the Spark driver to the executor. The function that runs on the Spark executor should call some static function on the singleton object that holds the JanusGraph instance. If the singleton object is called for the first time, locally on each Spark executor, it creates the JanusGraph instance and its static convenience method returns a GraphTraversalSource g. If the executor function runs a second time (on the next partition of your RDD or DataFrame as input) it again calls the convenience function on the singleton object, but now gets a GraphTraversalSource returned from the existing JanusGraph instance.

Best wishes,     Marc


Re: High HBase backend 'configuration' row contention

Tendai Munetsi
 

Hi Marc,

Thanks for the feedback and suggestion. We investigated applying the JanusGraphFactory inside a singleton object as you've suggested, but ran into the issue that the JanusGraphFactory is not serializable as required for Spark singletons. Do you have any ideas of how to get around this issue?

Thanks,
Tendai


Re: Janusgraph embedded multi instance(JVM) data sync issue

hadoopmarc@...
 

Hi Pawan,

You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself.
See the third question from Boxuan Li above, in particular:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html

Best wishes,   Marc


Re: JG Schema - addConnection seem to create duplicate connections

Peter Molnar
 

Hi Marc,

Thanks a lot for looking into this. As requested, I filled an issue about this on Github: https://github.com/JanusGraph/janusgraph/issues/2950

Thanks,
Peter


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi Marc,

I don't think data cache was created due to elastic search/mixed index only. I have seen this on basic property/node without index as well. I am thinking let's work on basic node/property then we can plan for mixed index cases.

Any suggestions for  basic case without an index backend?

Thanks,
Pawan


On Sat, Jan 15, 2022 at 5:16 PM <hadoopmarc@...> wrote:
Hi Pawan,

OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex.

Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in:
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html

One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java

So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh?

Best wishes,    Marc



--
Thanks & Regard

PAWAN SHRIWAS


Re: JG Schema - addConnection seem to create duplicate connections

hadoopmarc@...
 

Hi Peter,

Thanks for reporting. I think it is a bug. I checked with the standalone gremlin REPL of janusgraph-0.6.0, using:
graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties')

This gives the same results and if you add the from toEdge connections first, the FromEdge gets 4 connections.

You can check that two of the four connections are redundant, that is, they refer to the same edge in the schema:

gremlin> edges[1].mappedConnections()
==>org.janusgraph.core.Connection@1fecfaea
==>org.janusgraph.core.Connection@4872669f
==>org.janusgraph.core.Connection@483f286e
==>org.janusgraph.core.Connection@4bb147ec
gremlin> edges[1].mappedConnections()[0].getConnectionEdge()
==>e[hs0-el-1th-st][525-~T$SchemaRelated->1037]
gremlin> edges[1].mappedConnections()[1].getConnectionEdge()
==>e[ikg-el-1th-171][525-~T$SchemaRelated->1549]
gremlin> edges[1].mappedConnections()[2].getConnectionEdge()
==>e[hs0-el-1th-st][525-~T$SchemaRelated->1037]
gremlin> edges[1].mappedConnections()[3].getConnectionEdge()
==>e[ikg-el-1th-171][525-~T$SchemaRelated->1549]

Finally, I checked that the schema results remain the same if you add the following config properties to the graph (as suggested by the ref docs):
schema.default=none
schema.constraints=true

Can you please report this as an issue on: https://github.com/JanusGraph/janusgraph/issues

Best wishes,   Marc


On Tue, Jan 11, 2022 at 01:06 PM, Peter Molnar wrote:
mgmt = graph.openManagement();


Re: Fastest way to check if a property key is mixed indexed or not

hadoopmarc@...
 

Hi Harshit,

The performance impact for JanusGraph when including a property key in multiple mixed indices, is negligable (the selection of the index for a specific query will be a tat slower). Additional mixed indices imply a heavier load on the indexing backend (in particular memory and storage, CPU during inserts) but with little impact on response times if the cluster is dimensioned properly.

Marc


Re: Janusgraph embedded multi instance(JVM) data sync issue

hadoopmarc@...
 

Hi Pawan,

OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex.

Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in:
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html

One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java

So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh?

Best wishes,    Marc


Re: Fastest way to check if a property key is mixed indexed or not

Harshit Sharma
 

Will there be any performance impact if i will index a property key in multiple indices (mixed index)?


On Sat, 15 Jan, 2022, 3:55 pm , <hadoopmarc@...> wrote:
Hi Harshit,

The concept "property is indexed or not" is ambiguous because an index can have multiple property keys. If you want to know if there is an index with a specific property key as the only key, indeed you would have to do something like in your example code (but modified).

Best wishes,   Marc


Re: Fastest way to check if a property key is mixed indexed or not

hadoopmarc@...
 

Hi Harshit,

The concept "property is indexed or not" is ambiguous because an index can have multiple property keys. If you want to know if there is an index with a specific property key as the only key, indeed you would have to do something like in your example code (but modified).

Best wishes,   Marc


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi Marc,

I have removed cache properties from instances and we already have new transactions for each api operation but still facing stale data issues in other instances for some time.

Below is the code which is used for the new transaction for each operation.

In my embedded janusgraph service, We always create new translations for each api operation using below code and do commit or rollback at the end of api operation.  but sometimes it works and sometimes not. Is it a sync kind of issue which varies between graph instances in multiple services(JVM).

// Create graph instance code(once service start) 
  String filePath = ConfigUtils.getString(GraphConstants.GRAPH_FILE_PATH);
  JanusGraph graphinstance = embeddedConnection.open(filePath);

// create transaction code for each api operation
  JanusgraphTransaction threadedTransaction=  graphinstance.getGraphInstance().newTransaction();

// we do commit or rollback at end of each api operation
        threadedTransaction.commit();
                 //or 
        threadedTransaction.rollback();

Let me know if anything related to configuration or any code needs to tried for the same.

Thanks,
Pawan

On Fri, Jan 7, 2022 at 1:45 PM <hadoopmarc@...> wrote:
Hi Pawan,

Your requirement for instant synchronization cannot work with JanusGraph caches enabled, because JanusGraph will get data from the cache if available, instead of getting the latest data from the backend. So,

  • cache.db-cache = false
  • be sure to start a new transaction before querying for the latest data (e.g. by executing a g.tx().commit())
Best wishes,    Marc



--
Thanks & Regard

PAWAN SHRIWAS


Fastest way to check if a property key is mixed indexed or not

Harshit Sharma
 

Is there a way I can check if a particular property is indexed or not?

I know the following method but there I will have to traverse all indexes

List<JanusgraphIndex> indexList = mgmt.getIndexes(Vertex.class)
For(index : indexList){
  propertyKeys = index.getFieldKeys()
  if (propertyKeys.contains("KEY1")
      return true;
}
return false;

is there a better way to do the same?
--
Regards,

Harshit Sharma
+91-9901459920


Re: New Property keys in existing index getting stuck in registered state

Harshit Sharma
 

Is it allowed to index the same property key in two different indexes.
For example I created a property key graphId and created two indexes vertexIndex, edgeIndex.
Index graphId in both indexes.

The problem I'm facing is this graphId index is getting enabled in vertexIndex because I'm creating it first but it is getting stuck in REGISTERED state for edgeIndex

On Wed, Jan 12, 2022 at 6:38 AM Boxuan Li <liboxuan@...> wrote:
Can you post the stacktrace (or the place where NPE is thrown)?

On Wed, Jan 12, 2022 at 2:50 AM Harshit Sharma <harshit.sharma1080@...> wrote:
That is not working. According to Documentation https://docs.janusgraph.org/schema/index-management/index-performance/
After build index i'm calling ManagementSystem.awaitGraphIndexStatus(graph, INDEX_NAME).call()
but this call is throwing nullPointerException.

On Tue, Jan 11, 2022 at 8:18 PM Boxuan Li <liboxuan@...> wrote:
Hi Harshit,


On Tue, Jan 11, 2022 at 10:43 PM Harshit Sharma <harshit.sharma1080@...> wrote:
Sorry because of type posting again -

If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index "graphVertexIndex"
and I indexed the same key for edges in index names "graphEdgeIndex".
Now it is getting enabled in "graphVertexIndex" but getting stuck in the Registered state in "graphEdgeIndex"

On Tue, Jan 11, 2022 at 8:10 PM Harshit Sharma via lists.lfaidata.foundation <harshit.sharma1080=gmail.com@...> wrote:
If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index graphVertexIndex
while I indexed the same key for an edge in graphEdgeIndex
it is getting enabled in graphVertexIndex but stuck in the Registered state

--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920


Re: New Property keys in existing index getting stuck in registered state

Boxuan Li
 

Can you post the stacktrace (or the place where NPE is thrown)?

On Wed, Jan 12, 2022 at 2:50 AM Harshit Sharma <harshit.sharma1080@...> wrote:
That is not working. According to Documentation https://docs.janusgraph.org/schema/index-management/index-performance/
After build index i'm calling ManagementSystem.awaitGraphIndexStatus(graph, INDEX_NAME).call()
but this call is throwing nullPointerException.

On Tue, Jan 11, 2022 at 8:18 PM Boxuan Li <liboxuan@...> wrote:
Hi Harshit,


On Tue, Jan 11, 2022 at 10:43 PM Harshit Sharma <harshit.sharma1080@...> wrote:
Sorry because of type posting again -

If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index "graphVertexIndex"
and I indexed the same key for edges in index names "graphEdgeIndex".
Now it is getting enabled in "graphVertexIndex" but getting stuck in the Registered state in "graphEdgeIndex"

On Tue, Jan 11, 2022 at 8:10 PM Harshit Sharma via lists.lfaidata.foundation <harshit.sharma1080=gmail.com@...> wrote:
If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index graphVertexIndex
while I indexed the same key for an edge in graphEdgeIndex
it is getting enabled in graphVertexIndex but stuck in the Registered state

--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920


Re: New Property keys in existing index getting stuck in registered state

Harshit Sharma
 

That is not working. According to Documentation https://docs.janusgraph.org/schema/index-management/index-performance/
After build index i'm calling ManagementSystem.awaitGraphIndexStatus(graph, INDEX_NAME).call()
but this call is throwing nullPointerException.

On Tue, Jan 11, 2022 at 8:18 PM Boxuan Li <liboxuan@...> wrote:
Hi Harshit,


On Tue, Jan 11, 2022 at 10:43 PM Harshit Sharma <harshit.sharma1080@...> wrote:
Sorry because of type posting again -

If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index "graphVertexIndex"
and I indexed the same key for edges in index names "graphEdgeIndex".
Now it is getting enabled in "graphVertexIndex" but getting stuck in the Registered state in "graphEdgeIndex"

On Tue, Jan 11, 2022 at 8:10 PM Harshit Sharma via lists.lfaidata.foundation <harshit.sharma1080=gmail.com@...> wrote:
If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index graphVertexIndex
while I indexed the same key for an edge in graphEdgeIndex
it is getting enabled in graphVertexIndex but stuck in the Registered state

--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920



--
Regards,

Harshit Sharma
+91-9901459920

301 - 320 of 6656