Indexing Strategies for RDF edges/predicates on Janusgraph


Matthew Nguyen <nguyenm9@...>
 

Hi, I am trying to build a triplestore ontop of JG.  The general model is:

Vertex (subject or object) Properties:
  Label
  Value (IRI, Literal) - indexed

Edge (predicate) Properties:
  Label (predicate)
  hash - effectively a unique hash of predicate so I can globally index it

So effectively we can have Vertex(subject) -> Edge (predicate) -> Vertex(object)

Let's assume I insert the following triples into this model

<matt> <employedBy> <some_company>
<jane> <employedBy> <some_company>
<product1> <isSoldBy> <some_company>
<some_offce> <isLeasedBy> <some_company>
etc

let's say there's literally a 1k different predicates that can be associated with <some_company> and things like <employedBy> can have high cardinality if the company is large.  What's a good way to index these edges/predicates so I can quickly query for all a particular type of edge/predicate on <some_company> (eg 'give me all the ?people <employedBy> <some_company>')

I'm aware of the vertex-centric indexes on edges but it appears I would need to build an index for each of the possible edge labels of <some_company> if I understand the docs correctly (https://docs.janusgraph.org/schema/index-management/index-performance/#edge-indexes).  Please correct me if I'm wrong.  If not, is there another strategy I can use?

thx, matt

Join {janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.