toggle quoted messageShow quoted text
Thanks for explaining, I feared already that I had missed something. I think each type of query has its optimal treatment. When you have two properties to select on, you would have the cases:
- small result set (let us say smaller than 1000 vertices). This is served well by the default CompositeIndex or MixedIndex on these two property keys
- large result set. Here, it is probably more efficient to work with stored vertex id's. However, now you store the id's as a dictionary with the values of p2 as keys. So, your query becomes g.V(ids[p2_value]).
If you can make sense of what works best, it would be interesting to read about your results in a blog!
Btw, your use case of getting a large set of vertices as start of a traversal query, is possibly better served by postgresl or a some linearly scalable SQL store. JanusGraph shines at longer traversals for a small number of starting vertices.
Best wishes, Marc
Op maandag 7 oktober 2019 15:58:36 UTC+2 schreef Lilly:
I guess I did not explain my issue very well.
What I meant to say is this. Suppose these ids correspond to some filtering criterion. Now having these ids I can create the subgraph.
However, if on this subgraph I want to use another index (not the one related to the filtering criterion) "property", this will not be used.
A (hopefully) simple example.
Say I have a graph with properties p1, p2 and indecees on both. Now I get all indecees of vertices that have p1=x and store them in ids.
Now doing g.V(ids).has(p2,...) will not make use of index p2. At least it does not show up in the profile step.
Is it clear now what I mean? Or am I mistaken?
Am Montag, 7. Oktober 2019 15:49:01 UTC+2 schrieb ma...@...:
When you have the vertex id, you do not need any index. The index is a lookup table from property value to vertex id.
Op maandag 7 oktober 2019 08:15:50 UTC+2 schreef Lilly:
Thanks for your reply!
Your suggestions would fetch the subgraph efficiently. However, on this subgraph I could no longer use any of my other indecees.
Say I have an index on "property". Than g.V(ids).has("property",...) would no longer make use of the index on "property" (only g.V().has("property",..) does.
Especially if the subgraph is still rather large, this would be desirable though.
Any thoughts on how to achieve this?
Am Sonntag, 6. Oktober 2019 09:47:25 UTC+2 schrieb ma...@...:
Interesting question. For the JanusGraph backends to lookup the vertices of the subgraph efficiently, they need the id's of the vertices. The traversal is then g.V(ids) . There are different ways to get these id's:
- store the id's on ingestion
- query the id's once and store them
- give the subgraph vertices a specific property and run an index on that property. I doubt, however, that this will be efficient for large subgraphs. @Anyone ever tried?
- maybe the JanusGraph IDPlacementStrategy could provide a way to only query the subgraph vertices without knowing their explicit ids. Seem complicated compared to the first two options.
Op vrijdag 4 oktober 2019 17:48:52 UTC+2 schreef Lilly:
I persisted a janusgraph g1 (with Cassandra backend if that is relevant). Now I would like to
persist a "view" of this graph g1, i.e. a subgraph g2 of g1 which only
contains some of the nodes and edges of g1. This subgraph is to also have
possess all the indecees of the affected nodes and edges.
I am aware of the subgraphstrategy, which can create such a view at runtime. Is it possible to persist this view? I would like to circumvent having to create this view all over again each time. Also, with this view created at runtime, I can no longer exploit other indecees.
If this is not possible, is there another way to achieve this?
Thanks a lot!!