Re: Indexing Strategies for RDF edges/predicates on Janusgraph
AMIYA KUMAR SAHOO
Hi Mathew, Both of the example shows 2 different types of default index. g.V(h).out('mother') - This is example for default vertex-centric indexes per edge label - This will help to traverse specific type of edge among different types of edge quickly. - in your case to find all employees employedBy a company will use this. g.V(h).values('age') - This is example for default vertex-centric indexes per property key. - This will help to get the value of a single property among several properties of a single vertex Now there can be a situation you can have 1k types of edges associated to a vertex (one company). Except emploedBy edge, other edges have less cardinality(let's say < 10). But 2k employees employedBy by that company. You want to find if company has a employee with name John. In this case if your your travesal starts from company and goes with employedBy edge, it has to traverse all 2k edges to find out whether John is an employee or not. This situation can be made faster if employee name is available on edge and there is a VCI enabled on it. This might not be a very good example as it can be optimised in different ways 1) if employee have less degree for employedBy edge, you can start traversal from employee vertex. Hope it helps, Amiya
On Tue, 25 Jan 2022, 00:01 Matthew Nguyen via lists.lfaidata.foundation, <nguyenm9=aol.com@...> wrote:
|
|
Re: Indexing Strategies for RDF edges/predicates on Janusgraph
Matthew Nguyen <nguyenm9@...>
Hi Amiya, I saw that but wasn't quite sure the intent given the example. It talks about edge labels but the examples are vertices & values? g.V(h).out('mother') -> returns a vertex traversal? Also, what do you mean by 'But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.'? thx, matt
|
|
Re: Indexing Strategies for RDF edges/predicates on Janusgraph
AMIYA KUMAR SAHOO
Hi Mathew, As per the below Note from Janusgraph docs, even if company is having 1k different types of edge related to it, traverse by edge lable will be fast. Such as find employees employedBy (edge lable) company. But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property. JanusGraph automatically builds vertex-centric indexes per edge label and property key. That means, even with thousands of incident battled edges, queries like g.V(h).out('mother') or g.V(h).values('age') are efficiently answered by the local index.Thanks, Amiya
On Mon, 24 Jan 2022, 12:32 , <hadoopmarc@...> wrote: Hi Matthew,
|
|
Re: Indexing Strategies for RDF edges/predicates on Janusgraph
hadoopmarc@...
Hi Matthew,
It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use. Cheers, Marc
|
|
Indexing Strategies for RDF edges/predicates on Janusgraph
Matthew Nguyen <nguyenm9@...>
Hi, I am trying to build a triplestore ontop of JG. The general model is: Vertex (subject or object) Properties: Edge (predicate) Properties: So effectively we can have Vertex(subject) -> Edge (predicate) -> Vertex(object) <matt> <employedBy> <some_company> I'm aware of the vertex-centric indexes on edges but it appears I would need to build an index for each of the possible edge labels of <some_company> if I understand the docs correctly (https://docs.janusgraph.org/schema/index-management/index-performance/#edge-indexes). Please correct me if I'm wrong. If not, is there another strategy I can use? thx, matt
|
|
Re: Janusgraph embedded multi instance(JVM) data sync issue
Pawan Shriwas
Hi Marc, Thanks for your suggestion, However I am testing it on a local environment having a single replication factor. I believe if the replication factor is one then in all cases it should give me the same data/information in other instances as well. see below local property file information gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend=cql storage.hostname=127.0.0.1 storage.cql.keyspace=janusgraph storage.port=9042 schema.constraints=true ############ CQL Properties ############ storage.cql.read-consistency-level=LOCAL_QUORUM storage.cql.write-consistency-level=LOCAL_QUORUM storage.cql.replication-factor=1 Please see attached API code in for create update and get for local sample application. Let me know if something is wrong here because that refresh of data is not working on another embedded instance with the same configuration. Thanks, Pawan
On Thu, Jan 20, 2022 at 12:44 PM <hadoopmarc@...> wrote: Hi Pawan, --
Thanks & Regard PAWAN SHRIWAS
|
|
[ANNOUNCE] JanusGraph 0.6.1 Release
The JanusGraph Technical Steering Committee is excited to announce the release of JanusGraph 0.6.1.
JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors. The release artifacts can be found at this location:
https://github.com/JanusGraph/janusgraph/releases/tag/v0.6.1 A full binary distribution is provided for user convenience: https://github.com/JanusGraph/janusgraph/releases/download/v0.6.1/janusgraph-full-0.6.1.zip A truncated binary distribution is provided:
https://github.com/JanusGraph/janusgraph/releases/download/v0.6.1/janusgraph-0.6.1.zip The online docs can be found here: https://docs.janusgraph.org To view the resolved issues and commits check the milestone here:
https://github.com/JanusGraph/janusgraph/milestone/22?closed=1Thank you very much,
Oleksandr Porunov
|
|
Re: High HBase backend 'configuration' row contention
hadoopmarc@...
Hi Tendai,
"Not serializable" sounds as if you pass a JanusGraph instance from the Spark driver to the executor. The function that runs on the Spark executor should call some static function on the singleton object that holds the JanusGraph instance. If the singleton object is called for the first time, locally on each Spark executor, it creates the JanusGraph instance and its static convenience method returns a GraphTraversalSource g. If the executor function runs a second time (on the next partition of your RDD or DataFrame as input) it again calls the convenience function on the singleton object, but now gets a GraphTraversalSource returned from the existing JanusGraph instance. Best wishes, Marc
|
|
Re: High HBase backend 'configuration' row contention
Tendai Munetsi
Hi Marc,
Thanks for the feedback and suggestion. We investigated applying the JanusGraphFactory inside a singleton object as you've suggested, but ran into the issue that the JanusGraphFactory is not serializable as required for Spark singletons. Do you have any ideas of how to get around this issue? Thanks, Tendai
|
|
Re: Janusgraph embedded multi instance(JVM) data sync issue
hadoopmarc@...
Hi Pawan,
You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself. See the third question from Boxuan Li above, in particular: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html Best wishes, Marc
|
|
Re: JG Schema - addConnection seem to create duplicate connections
Peter Molnar
Hi Marc,
Thanks a lot for looking into this. As requested, I filled an issue about this on Github: https://github.com/JanusGraph/janusgraph/issues/2950 Thanks, Peter
|
|
Re: Janusgraph embedded multi instance(JVM) data sync issue
Pawan Shriwas
Hi Marc, I don't think data cache was created due to elastic search/mixed index only. I have seen this on basic property/node without index as well. I am thinking let's work on basic node/property then we can plan for mixed index cases. Any suggestions for basic case without an index backend? Thanks, Pawan
On Sat, Jan 15, 2022 at 5:16 PM <hadoopmarc@...> wrote: Hi Pawan, --
Thanks & Regard PAWAN SHRIWAS
|
|
Re: JG Schema - addConnection seem to create duplicate connections
hadoopmarc@...
Hi Peter,
toggle quoted messageShow quoted text
Thanks for reporting. I think it is a bug. I checked with the standalone gremlin REPL of janusgraph-0.6.0, using: graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties') This gives the same results and if you add the from toEdge connections first, the FromEdge gets 4 connections. You can check that two of the four connections are redundant, that is, they refer to the same edge in the schema: gremlin> edges[1].mappedConnections() ==>org.janusgraph.core.Connection@1fecfaea ==>org.janusgraph.core.Connection@4872669f ==>org.janusgraph.core.Connection@483f286e ==>org.janusgraph.core.Connection@4bb147ec gremlin> edges[1].mappedConnections()[0].getConnectionEdge() ==>e[hs0-el-1th-st][525-~T$SchemaRelated->1037] gremlin> edges[1].mappedConnections()[1].getConnectionEdge() ==>e[ikg-el-1th-171][525-~T$SchemaRelated->1549] gremlin> edges[1].mappedConnections()[2].getConnectionEdge() ==>e[hs0-el-1th-st][525-~T$SchemaRelated->1037] gremlin> edges[1].mappedConnections()[3].getConnectionEdge() ==>e[ikg-el-1th-171][525-~T$SchemaRelated->1549] Finally, I checked that the schema results remain the same if you add the following config properties to the graph (as suggested by the ref docs): schema.default=none schema.constraints=true Can you please report this as an issue on: https://github.com/JanusGraph/janusgraph/issues Best wishes, Marc
On Tue, Jan 11, 2022 at 01:06 PM, Peter Molnar wrote: mgmt = graph.openManagement();
|
|
Re: Fastest way to check if a property key is mixed indexed or not
hadoopmarc@...
Hi Harshit,
The performance impact for JanusGraph when including a property key in multiple mixed indices, is negligable (the selection of the index for a specific query will be a tat slower). Additional mixed indices imply a heavier load on the indexing backend (in particular memory and storage, CPU during inserts) but with little impact on response times if the cluster is dimensioned properly. Marc
|
|
Re: Janusgraph embedded multi instance(JVM) data sync issue
hadoopmarc@...
Hi Pawan,
OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex. Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in: https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in: https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh? Best wishes, Marc
|
|
Re: Fastest way to check if a property key is mixed indexed or not
Harshit Sharma
Will there be any performance impact if i will index a property key in multiple indices (mixed index)?
On Sat, 15 Jan, 2022, 3:55 pm , <hadoopmarc@...> wrote: Hi Harshit,
|
|
Re: Fastest way to check if a property key is mixed indexed or not
hadoopmarc@...
Hi Harshit,
The concept "property is indexed or not" is ambiguous because an index can have multiple property keys. If you want to know if there is an index with a specific property key as the only key, indeed you would have to do something like in your example code (but modified). Best wishes, Marc
|
|
Re: Janusgraph embedded multi instance(JVM) data sync issue
Pawan Shriwas
Hi Marc, I have removed cache properties from instances and we already have new transactions for each api operation but still facing stale data issues in other instances for some time. Below is the code which is used for the new transaction for each operation. In my embedded janusgraph service, We always create new translations for each api operation using below code and do commit or rollback at the end of api operation. but sometimes it works and sometimes not. Is it a sync kind of issue which varies between graph instances in multiple services(JVM). // Create graph instance code(once service start) String filePath = ConfigUtils.getString(GraphConstants.GRAPH_FILE_PATH); JanusGraph graphinstance = embeddedConnection.open(filePath); // create transaction code for each api operation JanusgraphTransaction threadedTransaction= graphinstance.getGraphInstance().newTransaction(); // we do commit or rollback at end of each api operation threadedTransaction.commit(); //or threadedTransaction.rollback(); Let me know if anything related to configuration or any code needs to tried for the same. Thanks, Pawan
On Fri, Jan 7, 2022 at 1:45 PM <hadoopmarc@...> wrote: Hi Pawan, --
Thanks & Regard PAWAN SHRIWAS
|
|
Fastest way to check if a property key is mixed indexed or not
Harshit Sharma
Is there a way I can check if a particular property is indexed or not? I know the following method but there I will have to traverse all indexes List<JanusgraphIndex> indexList = mgmt.getIndexes(Vertex.class) For(index : indexList){ propertyKeys = index.getFieldKeys() if (propertyKeys.contains("KEY1") return true; } return false; is there a better way to do the same? -- Regards, Harshit Sharma +91-9901459920
|
|
Re: New Property keys in existing index getting stuck in registered state
Harshit Sharma
Is it allowed to index the same property key in two different indexes. For example I created a property key graphId and created two indexes vertexIndex, edgeIndex. Index graphId in both indexes. The problem I'm facing is this graphId index is getting enabled in vertexIndex because I'm creating it first but it is getting stuck in REGISTERED state for edgeIndex
On Wed, Jan 12, 2022 at 6:38 AM Boxuan Li <liboxuan@...> wrote:
--
Regards, Harshit Sharma +91-9901459920
|
|