Re: Indexing Strategies for RDF edges/predicates on Janusgraph

AMIYA KUMAR SAHOO

Hi Mathew,

Both of the example shows 2 different types of default index.

g.V(h).out('mother')
- This is example for default vertex-centric indexes per edge label
- This will help to traverse specific type of edge among different types of edge quickly.
- in your case to find all employees employedBy a company will use this.

g.V(h).values('age')
- This is example for default vertex-centric indexes per property key.
- This will help to get the value of a single property among several properties of a single vertex

Now there can be a situation you can have 1k types of edges associated to a vertex (one company). Except emploedBy edge, other edges have less cardinality(let's say < 10). But 2k employees  employedBy by that company. You want to find if company has a employee with name John. In this case if your your travesal starts from company and goes with employedBy edge, it has to traverse all 2k edges to find out whether John is an employee or not. This situation can be made faster if employee name is available on edge and there is a VCI enabled on it.

This might not be a very good example as it can be optimised in different ways
1) if employee have less degree for employedBy edge, you can start traversal from employee vertex.

Hope it helps,
Amiya

On Tue, 25 Jan 2022, 00:01 Matthew Nguyen via lists.lfaidata.foundation, <nguyenm9=aol.com@...> wrote:

Hi Amiya, I saw that but wasn't quite sure the intent given the example.  It talks about edge labels but the examples are vertices & values?

g.V(h).out('mother') -> returns a vertex traversal?
g.V(h).values('age') -> returns a Value?

Also, what do you mean by 'But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.'?

thx, matt

Re: Indexing Strategies for RDF edges/predicates on Janusgraph

Matthew Nguyen <nguyenm9@...>

Hi Amiya, I saw that but wasn't quite sure the intent given the example.  It talks about edge labels but the examples are vertices & values?

g.V(h).out('mother') -> returns a vertex traversal?
g.V(h).values('age') -> returns a Value?

Also, what do you mean by 'But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.'?

thx, matt

Re: Indexing Strategies for RDF edges/predicates on Janusgraph

AMIYA KUMAR SAHOO

Hi Mathew,

As per the below Note from Janusgraph docs, even if company is having 1k different types of edge related to it, traverse by edge lable will be fast.

Such as find employees employedBy (edge lable) company.

But if you have a high cardinality for a single edge type, then you have to manually create edge index on respective property.

JanusGraph automatically builds vertex-centric indexes per edge label and property key. That means, even with thousands of incident `battled` edges, queries like `g.V(h).out('mother')` or `g.V(h).values('age')` are efficiently answered by the local index.

Thanks,
Amiya

On Mon, 24 Jan 2022, 12:32 , <hadoopmarc@...> wrote:
Hi Matthew,

It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use.

Cheers,     Marc

Re: Indexing Strategies for RDF edges/predicates on Janusgraph

Hi Matthew,

It would be possible to replace the employedBy, isSoldby, isLeasedBy relations with a relatedToCompany relation with employment, selling and lease properties. But I do not see any advantages compare to the original model, because there is nothing wrong with a lot of frequently used vertex centric indices and the original model is easier to use.

Cheers,     Marc

Indexing Strategies for RDF edges/predicates on Janusgraph

Matthew Nguyen <nguyenm9@...>

Hi, I am trying to build a triplestore ontop of JG.  The general model is:

Vertex (subject or object) Properties:
Label
Value (IRI, Literal) - indexed

Edge (predicate) Properties:
Label (predicate)
hash - effectively a unique hash of predicate so I can globally index it

So effectively we can have Vertex(subject) -> Edge (predicate) -> Vertex(object)

Let's assume I insert the following triples into this model

<matt> <employedBy> <some_company>
<jane> <employedBy> <some_company>
<product1> <isSoldBy> <some_company>
<some_offce> <isLeasedBy> <some_company>
etc

let's say there's literally a 1k different predicates that can be associated with <some_company> and things like <employedBy> can have high cardinality if the company is large.  What's a good way to index these edges/predicates so I can quickly query for all a particular type of edge/predicate on <some_company> (eg 'give me all the ?people <employedBy> <some_company>')

I'm aware of the vertex-centric indexes on edges but it appears I would need to build an index for each of the possible edge labels of <some_company> if I understand the docs correctly (https://docs.janusgraph.org/schema/index-management/index-performance/#edge-indexes).  Please correct me if I'm wrong.  If not, is there another strategy I can use?

thx, matt

Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas

Hi Marc,

However I am testing it on a local environment having a single replication factor. I believe if the replication factor is one then in all cases it should give me the same data/information in other instances as well.

see below local property file information

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=127.0.0.1
storage.cql.keyspace=janusgraph
storage.port=9042
schema.constraints=true
############ CQL Properties ############

storage.cql.write-consistency-level=LOCAL_QUORUM
storage.cql.replication-factor=1

Please see attached API code in for create update and get for local sample application. Let me know if something is wrong here because that refresh of data is not working on another embedded instance with the same configuration.

Thanks,
Pawan

On Thu, Jan 20, 2022 at 12:44 PM <hadoopmarc@...> wrote:
Hi Pawan,

You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself.
See the third question from Boxuan Li above, in particular:

Best wishes,   Marc

--
Thanks & Regard

PAWAN SHRIWAS

[ANNOUNCE] JanusGraph 0.6.1 Release

Oleksandr Porunov

The JanusGraph Technical Steering Committee is excited to announce the release of JanusGraph 0.6.1.

JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors.

The release artifacts can be found at this location:
https://github.com/JanusGraph/janusgraph/releases/tag/v0.6.1

A full binary distribution is provided for user convenience:

A truncated binary distribution is provided:

The online docs can be found here:
https://docs.janusgraph.org

To view the resolved issues and commits check the milestone here:
https://github.com/JanusGraph/janusgraph/milestone/22?closed=1

Thank you very much,
Oleksandr Porunov

Re: High HBase backend 'configuration' row contention

Hi Tendai,

"Not serializable" sounds as if you pass a JanusGraph instance from the Spark driver to the executor. The function that runs on the Spark executor should call some static function on the singleton object that holds the JanusGraph instance. If the singleton object is called for the first time, locally on each Spark executor, it creates the JanusGraph instance and its static convenience method returns a GraphTraversalSource g. If the executor function runs a second time (on the next partition of your RDD or DataFrame as input) it again calls the convenience function on the singleton object, but now gets a GraphTraversalSource returned from the existing JanusGraph instance.

Best wishes,     Marc

Re: High HBase backend 'configuration' row contention

Tendai Munetsi

Hi Marc,

Thanks for the feedback and suggestion. We investigated applying the JanusGraphFactory inside a singleton object as you've suggested, but ran into the issue that the JanusGraphFactory is not serializable as required for Spark singletons. Do you have any ideas of how to get around this issue?

Thanks,
Tendai

Re: Janusgraph embedded multi instance(JVM) data sync issue

Hi Pawan,

You are right, if issues already arise without index, you should investigate that first, even though a large graph without indices is useless in itself.
See the third question from Boxuan Li above, in particular:

Best wishes,   Marc

Re: JG Schema - addConnection seem to create duplicate connections

Peter Molnar

Hi Marc,

Thanks a lot for looking into this. As requested, I filled an issue about this on Github: https://github.com/JanusGraph/janusgraph/issues/2950

Thanks,
Peter

Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas

Hi Marc,

I don't think data cache was created due to elastic search/mixed index only. I have seen this on basic property/node without index as well. I am thinking let's work on basic node/property then we can plan for mixed index cases.

Any suggestions for  basic case without an index backend?

Thanks,
Pawan

On Sat, Jan 15, 2022 at 5:16 PM <hadoopmarc@...> wrote:
Hi Pawan,

OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex.

Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in:
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html

One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java

So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh?

Best wishes,    Marc

--
Thanks & Regard

PAWAN SHRIWAS

Re: JG Schema - addConnection seem to create duplicate connections

Hi Peter,

Thanks for reporting. I think it is a bug. I checked with the standalone gremlin REPL of janusgraph-0.6.0, using:
graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties')

This gives the same results and if you add the from toEdge connections first, the FromEdge gets 4 connections.

You can check that two of the four connections are redundant, that is, they refer to the same edge in the schema:

gremlin> edges[1].mappedConnections()
==>org.janusgraph.core.Connection@1fecfaea
==>org.janusgraph.core.Connection@4872669f
==>org.janusgraph.core.Connection@483f286e
==>org.janusgraph.core.Connection@4bb147ec
gremlin> edges[1].mappedConnections()[0].getConnectionEdge()
==>e[hs0-el-1th-st][525-~T\$SchemaRelated->1037]
gremlin> edges[1].mappedConnections()[1].getConnectionEdge()
==>e[ikg-el-1th-171][525-~T\$SchemaRelated->1549]
gremlin> edges[1].mappedConnections()[2].getConnectionEdge()
==>e[hs0-el-1th-st][525-~T\$SchemaRelated->1037]
gremlin> edges[1].mappedConnections()[3].getConnectionEdge()
==>e[ikg-el-1th-171][525-~T\$SchemaRelated->1549]

Finally, I checked that the schema results remain the same if you add the following config properties to the graph (as suggested by the ref docs):
schema.default=none
schema.constraints=true

Can you please report this as an issue on: https://github.com/JanusGraph/janusgraph/issues

Best wishes,   Marc

On Tue, Jan 11, 2022 at 01:06 PM, Peter Molnar wrote:
mgmt = graph.openManagement();

Re: Fastest way to check if a property key is mixed indexed or not

Hi Harshit,

The performance impact for JanusGraph when including a property key in multiple mixed indices, is negligable (the selection of the index for a specific query will be a tat slower). Additional mixed indices imply a heavier load on the indexing backend (in particular memory and storage, CPU during inserts) but with little impact on response times if the cluster is dimensioned properly.

Marc

Re: Janusgraph embedded multi instance(JVM) data sync issue

Hi Pawan,

OK, let's investigate further. You say that the issue occurs for both vertex creation and modification. Let's take the clearest case first: vertex creation with an indexed property. So, in your system setup, if you have added a new vertex with embedded intance1, sometimes it takes a minute or more before a query for this vertex (based on its property value) on instance2 returns the vertex. This can only mean that the elasticserch index sometimes does not return the new property value. This on its turn means that an elasticsearch replica has not yet been synced with the data about the new vertex.

Indeed, the janusgraph-elastic configs have a key index.[X].elasticsearch.bulk-refresh (default: false) which can be set to any of the values in:
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/docs-refresh.html

One can check the correspondence between this janusgraph config item and the elasticsearch API parameter in:
https://github.com/JanusGraph/janusgraph/blob/v0.6.0/janusgraph-es/src/main/java/org/janusgraph/diskstorage/es/rest/RestElasticSearchClient.java

So, can you see what happens with the other possible values for index.[X].elasticsearch.bulk-refresh?

Best wishes,    Marc

Re: Fastest way to check if a property key is mixed indexed or not

Harshit Sharma

Will there be any performance impact if i will index a property key in multiple indices (mixed index)?

On Sat, 15 Jan, 2022, 3:55 pm , <hadoopmarc@...> wrote:
Hi Harshit,

The concept "property is indexed or not" is ambiguous because an index can have multiple property keys. If you want to know if there is an index with a specific property key as the only key, indeed you would have to do something like in your example code (but modified).

Best wishes,   Marc

Re: Fastest way to check if a property key is mixed indexed or not

Hi Harshit,

The concept "property is indexed or not" is ambiguous because an index can have multiple property keys. If you want to know if there is an index with a specific property key as the only key, indeed you would have to do something like in your example code (but modified).

Best wishes,   Marc

Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas

Hi Marc,

I have removed cache properties from instances and we already have new transactions for each api operation but still facing stale data issues in other instances for some time.

Below is the code which is used for the new transaction for each operation.

In my embedded janusgraph service, We always create new translations for each api operation using below code and do commit or rollback at the end of api operation.  but sometimes it works and sometimes not. Is it a sync kind of issue which varies between graph instances in multiple services(JVM).

// Create graph instance code(once service start)
String filePath = ConfigUtils.getString(GraphConstants.GRAPH_FILE_PATH);
JanusGraph graphinstance = embeddedConnection.open(filePath);

// create transaction code for each api operation

// we do commit or rollback at end of each api operation
//or

Let me know if anything related to configuration or any code needs to tried for the same.

Thanks,
Pawan

On Fri, Jan 7, 2022 at 1:45 PM <hadoopmarc@...> wrote:
Hi Pawan,

Your requirement for instant synchronization cannot work with JanusGraph caches enabled, because JanusGraph will get data from the cache if available, instead of getting the latest data from the backend. So,

• cache.db-cache = false
• be sure to start a new transaction before querying for the latest data (e.g. by executing a g.tx().commit())
Best wishes,    Marc

--
Thanks & Regard

PAWAN SHRIWAS

Fastest way to check if a property key is mixed indexed or not

Harshit Sharma

Is there a way I can check if a particular property is indexed or not?

I know the following method but there I will have to traverse all indexes

List<JanusgraphIndex> indexList = mgmt.getIndexes(Vertex.class)
For(index : indexList){
propertyKeys = index.getFieldKeys()
if (propertyKeys.contains("KEY1")
return true;
}
return false;

is there a better way to do the same?
--
Regards,

Harshit Sharma
+91-9901459920

Re: New Property keys in existing index getting stuck in registered state

Harshit Sharma

Is it allowed to index the same property key in two different indexes.
For example I created a property key graphId and created two indexes vertexIndex, edgeIndex.
Index graphId in both indexes.

The problem I'm facing is this graphId index is getting enabled in vertexIndex because I'm creating it first but it is getting stuck in REGISTERED state for edgeIndex

On Wed, Jan 12, 2022 at 6:38 AM Boxuan Li <liboxuan@...> wrote:
Can you post the stacktrace (or the place where NPE is thrown)?

On Wed, Jan 12, 2022 at 2:50 AM Harshit Sharma <harshit.sharma1080@...> wrote:
That is not working. According to Documentation https://docs.janusgraph.org/schema/index-management/index-performance/
After build index i'm calling ManagementSystem.awaitGraphIndexStatus(graph, INDEX_NAME).call()
but this call is throwing nullPointerException.

On Tue, Jan 11, 2022 at 8:18 PM Boxuan Li <liboxuan@...> wrote:
Hi Harshit,

Can you check if

Best,
Boxuan

On Tue, Jan 11, 2022 at 10:43 PM Harshit Sharma <harshit.sharma1080@...> wrote:
Sorry because of type posting again -

If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index "graphVertexIndex"
and I indexed the same key for edges in index names "graphEdgeIndex".
Now it is getting enabled in "graphVertexIndex" but getting stuck in the Registered state in "graphEdgeIndex"

On Tue, Jan 11, 2022 at 8:10 PM Harshit Sharma via lists.lfaidata.foundation <harshit.sharma1080=gmail.com@...> wrote:
If I'm adding new keys to an existing index, many keys are getting stuck in the registered or installed state
for example, I indexed a key "graphId" for the vertex in an index graphVertexIndex
while I indexed the same key for an edge in graphEdgeIndex
it is getting enabled in graphVertexIndex but stuck in the Registered state

--
Regards,

Harshit Sharma
+91-9901459920

--
Regards,

Harshit Sharma
+91-9901459920

--
Regards,

Harshit Sharma
+91-9901459920

--
Regards,

Harshit Sharma
+91-9901459920

 321 - 340 of 6678