Date   

Re: Forcing Janusgraph to use indices when performing traversal with Union step

brad@...
 

Thank you for your reply.

We can look into migrating to version 0.6 (or later)...

There exists an index for indexed-prop1, indexed-prop2, and indexed-prop3, but there are no multi-column indices (e.g., indexed-prop1 and indexed-prop2).  In our application, the columns that might be used for the search are specified by the user, and we wouldn't have any way of knowing in advance what combination of columns they might ask for.  The only thing that we can do now is to ensure that all of the columns that are used in a 'has' step are indexed.  

One approach that I thought might be an improvement is to perform the first traversal (i.e., " g.V().has('indexed-prop1', 'value1')") separately, collect the set of vertices which satisfy this query, then perform an additional query for each 'has' step that is currently in the 'union' step.  So, for example, we run g.V().has('indexed-prop1', 'value1') initially, and it returns 3 vertices (V1, V2, and V3).  We then run a traversal for indexed-prop2:   "g.V(V1, V2, V3).has('indexed-prop2', 'value2')", which returns a subset of the vertices returned by the first traversal.  Then, run another traversal, this time:   "g.V(V1, V2, V3).has('indexed-prop3', 'value3')", again returning a subset of the vertices.  Finally, figure out (programatically, without executing another traversal) the union of the vertices returned by the last 2 traversals.  This is an inelegant and brute-force technique which would probably work, but I would rather do this in a single traversal, and I haven't been able to figure out how to do this.  Can you recommend an approach for doing this kind of action in a single traversal?

Thanks,
Brad


Re: Forcing Janusgraph to use indices when performing traversal with Union step

Boxuan Li
 

Hi Brad,

I can see that your traversal is using the index index_ten1_apm_0 from the following snippet:

  backend-query                                                        3                      21.484
    \\_query=index_ten1_apm_0:[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_limit=4000

Then, JanusGraph uses in-memory filtering to check whether the results returned by the index_ten1_apm_0 satisfy your predicates has('indexed-prop2', 'value2’) or has('indexed-prop3', ‘value3’).

Do you have an index that contains both the field “indexed-prop1” and “indexed-prop2”, and an index that contains both the field “indexed-prop1” and “indexed-prop3”? If so, try replacing “union” step with “or” step. If not, then you could try creating those indexes, otherwise there is no optimization that JanusGraph can do.

Btw it looks like you are using a JanusGraph version < 0.6, which is out of maintenance. The same query should run moderately faster in the latest version due to a couple of optimizations.

Best,
Boxuan

On Feb 6, 2022, at 3:14 PM, brad@... wrote:

TRAVERSAL:
[GraphStep(vertex,[]), HasStep([~label.eq(ten1.apm.version), ten1.apm.idx.type_id.eq(i_javaServiceInstance)]), UnionStep([[HasStep([ten1.apm.idx.display_name.eq(JavaServiceInstance3)]), EndStep], [HasStep([ten1.apm.idx.str4.eq(998)]), EndStep]]), RangeGlobalStep(0,2147483647), HasStep([ten1.apm.idx.discovery_ts.lte(1643313164000), ten1.apm.idx.last_seen_ts.gt(0)]), OrderGlobalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), RangeGlobalStep(0,1000), GroupStep(value(uid),[FoldStep, OrderLocalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), UnfoldStep, RangeGlobalStep(0,1)]), LambdaFlatMapStep(lambda), RangeGlobalStep(0,2500), PropertyMapStep([uid, ten1.apm.idx.display_name, ten1.apm.idx.last_seen_ts, ten1.apm.idx.discovery_ts, typ],value)]
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(ten1.apm.version),...                     3           3          14.381    54.44
    \\_condition=(~label = ten1.apm.version AND ten1.apm.idx.type_id = i_javaServiceInstance)
    \\_orders=[]
    \\_isFitted=true
    \\_isOrdered=true
    \\_query=[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_index=index_ten1_apm_0
    \\_index_impl=search
  optimization                                                                                 0.011
  optimization                                                                                 0.381
  backend-query                                                        3                      21.484
    \\_query=index_ten1_apm_0:[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_limit=4000
UnionStep([[HasStep([ten1.apm.idx.display_name....                     2           2          10.005    37.87
  HasStep([ten1.apm.idx.display_name.eq(JavaSer...                     1           1           0.113
  EndStep                                                              1           1           0.076
  HasStep([ten1.apm.idx.str4.eq(998)])                                 1           1           0.208
  EndStep                                                              1           1           0.169
RangeGlobalStep(0,2147483647)                                          2           2           0.177     0.67
HasStep([ten1.apm.idx.discovery_ts.lte(16433131...                     2           2           0.616     2.33
OrderGlobalStep([[value(ten1.apm.idx.last_seen_...                     2           2           0.281     1.06
RangeGlobalStep(0,1000)                                                2           2           0.115     0.44
GroupStep(value(uid),[FoldStep, OrderLocalStep(...                     1           1           0.323     1.22
LambdaFlatMapStep(lambda)                                              2           2           0.082     0.31
RangeGlobalStep(0,2500)                                                2           2           0.062     0.23
PropertyMapStep([uid, ten1.apm.idx.display_name...                     2           2           0.373     1.41
                                            >TOTAL                     -           -          26.417        -
 


Re: Forcing Janusgraph to use indices when performing traversal with Union step

brad@...
 

TRAVERSAL:
[GraphStep(vertex,[]), HasStep([~label.eq(ten1.apm.version), ten1.apm.idx.type_id.eq(i_javaServiceInstance)]), UnionStep([[HasStep([ten1.apm.idx.display_name.eq(JavaServiceInstance3)]), EndStep], [HasStep([ten1.apm.idx.str4.eq(998)]), EndStep]]), RangeGlobalStep(0,2147483647), HasStep([ten1.apm.idx.discovery_ts.lte(1643313164000), ten1.apm.idx.last_seen_ts.gt(0)]), OrderGlobalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), RangeGlobalStep(0,1000), GroupStep(value(uid),[FoldStep, OrderLocalStep([[value(ten1.apm.idx.last_seen_ts), desc]]), UnfoldStep, RangeGlobalStep(0,1)]), LambdaFlatMapStep(lambda), RangeGlobalStep(0,2500), PropertyMapStep([uid, ten1.apm.idx.display_name, ten1.apm.idx.last_seen_ts, ten1.apm.idx.discovery_ts, typ],value)]
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(ten1.apm.version),...                     3           3          14.381    54.44
    \\_condition=(~label = ten1.apm.version AND ten1.apm.idx.type_id = i_javaServiceInstance)
    \\_orders=[]
    \\_isFitted=true
    \\_isOrdered=true
    \\_query=[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_index=index_ten1_apm_0
    \\_index_impl=search
  optimization                                                                                 0.011
  optimization                                                                                 0.381
  backend-query                                                        3                      21.484
    \\_query=index_ten1_apm_0:[(ten1.apm.idx.type_id = i_javaServiceInstance)](4000):index_ten1_apm_0
    \\_limit=4000
UnionStep([[HasStep([ten1.apm.idx.display_name....                     2           2          10.005    37.87
  HasStep([ten1.apm.idx.display_name.eq(JavaSer...                     1           1           0.113
  EndStep                                                              1           1           0.076
  HasStep([ten1.apm.idx.str4.eq(998)])                                 1           1           0.208
  EndStep                                                              1           1           0.169
RangeGlobalStep(0,2147483647)                                          2           2           0.177     0.67
HasStep([ten1.apm.idx.discovery_ts.lte(16433131...                     2           2           0.616     2.33
OrderGlobalStep([[value(ten1.apm.idx.last_seen_...                     2           2           0.281     1.06
RangeGlobalStep(0,1000)                                                2           2           0.115     0.44
GroupStep(value(uid),[FoldStep, OrderLocalStep(...                     1           1           0.323     1.22
LambdaFlatMapStep(lambda)                                              2           2           0.082     0.31
RangeGlobalStep(0,2500)                                                2           2           0.062     0.23
PropertyMapStep([uid, ten1.apm.idx.display_name...                     2           2           0.373     1.41
                                            >TOTAL                     -           -          26.417        -
 


Re: dynamic graphics, limits and global index

Matthew Nguyen <nguyenm9@...>
 

Thanks Marc.  Currently triplestore/LPG is on hold awaiting streaming incidental edge queries in order to play some more.  Hoping we will see a day when LPG/3store harmonize.  


-----Original Message-----
From: hadoopmarc@...
To: janusgraph-users@...
Sent: Sun, Feb 6, 2022 1:56 pm
Subject: Re: [janusgraph-users] dynamic graphics, limits and global index

Hi Matt,

Adding to what I stated above about independent composite indices for separate graphs on the same storage backend, the issue turns out to more nuanced for mixed indices on an indexing bakend, see the recent question:

https://lists.lfaidata.foundation/g/janusgraph-users/topic/88879391

I though it useful to add it to this thread too.

Marc

PS Good to hear that JanusGraph can possibly support your usecase!


Re: Forcing Janusgraph to use indices when performing traversal with Union step

Boxuan Li
 

Hi Brad,

Can you post the profile result of both queries? You can retrieve profile results by adding `.profile()` to the end of your query.

Best,
Boxuan


Re: JanusGraph database cache on distributed setup

hadoopmarc@...
 

How fast is immediately?  A well dimensioned cassandra or scylladb cluster (with its own block cache!) should be able to serve requests at the ms level.

https://www.scylladb.com/2017/09/18/scylla-2-0-workload-conditioning/

You only run into trouble with queries that ask for tens or hundreds of vertices, but you can ask if it is reasonable to be realtime at the ms level for such large queries.

Best wishes,     Marc


Re: dynamic graphics, limits and global index

hadoopmarc@...
 

Hi Matt,

Adding to what I stated above about independent composite indices for separate graphs on the same storage backend, the issue turns out to more nuanced for mixed indices on an indexing bakend, see the recent question:

https://lists.lfaidata.foundation/g/janusgraph-users/topic/88879391

I though it useful to add it to this thread too.

Marc

PS Good to hear that JanusGraph can possibly support your usecase!


Re: separate elastic search for separate graph

hadoopmarc@...
 

I had to try for myself and, indeed, the ref docs are not very clear about this. In order to give the second graph a separate, independent mixed ndex
you have to adapt the properties file for the second graph like (in this case a variant on janusgraph-cql-es.properties):

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql

storage.hostname=127.0.0.1
storage.cql.keyspace=janusgraph2

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25

index.search2.backend=elasticsearch
index.search2.hostname=127.0.0.1
index.search2.index-name=janusgraph2

After loading the graphs you can check against the indexing backend:

$ curl http://127.0.0.1:9200/_cat/indices
green  open .geoip_databases    O7iNY_U4Sui1yfFontk3lA 1 0 42 0 38.8mb 38.8mb
yellow open janusgraph_edges    OfbruO3BRIOPLMVYwUSWRw 1 1  6 0  4.4kb  4.4kb
yellow open janusgraph_vertices EGmMEvXuQXiIxTasEnSIDw 1 1  6 0  3.8kb  3.8kb
yellow open janusgraph2_vertices  UQITuYsnTA2d2J8MyyblLA 1 1  6 0  3.8kb  3.8kb
yellow open janusgraph2_edges     d1ZOPXVSTze0HQkg_vEz1A 1 1  6 0  4.4kb  4.4kb

Also, if you use the GraphOfTheGodsFactory for testing with two graphs, you have to use it like:
gremlin> GraphOfTheGodsFactory.load(graph1, 'search', false)
gremlin> GraphOfTheGodsFactory.load(graph2, 'search2', false)

I did not check if this can work for the ConfigurationGraphFactory, but if you find this impossible either of us should report an issue
for it (together with extending the ref docs with the above) .

Best wishes,    Marc


Forcing Janusgraph to use indices when performing traversal with Union step

brad@...
 

Hello,

I need to execute Janusgraph traversals, using a Union step, which contains multiple 'has' steps, each of which tests an indexed property for equality to a value.  Prior to the Union step, there is another 'has' step, which also tests an indexec property.

For example:

    g.V().has('indexed-prop1', 'value1').union(has('indexed-prop2', 'value2'), has('indexed-prop3', 'value3'))

How can I verify whether or not the traversal is using indices 'indexed-prop2' and 'indexed-prop3'?  I suspect that it is not.  The traversal seems to take a long time to run; however, when I simplify the traversal, by having only one 'has' step within the union, it runs very quickly.

What is the best way to execute this type of traversal.  My application is written in Java.

Thanks in advance,
Brad


Re: dynamic graphics, limits and global index

Matthew Nguyen <nguyenm9@...>
 

Hi Marc, I follow what you're saying but will point out it doesn't have to play out to exabytes. Reason is that IDs are non recyclable and there are losses due to bulk load reservations and deletions. I'm looking at JG from a 3store pov where a trillion triples isn't out of reach when it comes to knowledge graphs. Hearing that limits are scoped to the graph makes it much more palatable. 

thx, matt


Re: Updating our business info on your site

Misha Brukman
 

Hi Rosy,

Thanks for reaching out! Please let us know what needs to be changed and how, either by filing an issue on GitHub or via this email thread if you don't have a GitHub account, and we'll make the relevant changes on the site.

Best,
Misha

On Thu, Feb 3, 2022 at 1:09 PM Rosy Hunt <rosy.hunt@...> wrote:
Hello

Can you let me know who I need to speak to about updating our info on your resource pages? We have two shiny new products which are worth mentioning, and your site currently links to a somewhat outdated post :)

Many thanks in advance for your help.

Rosy

--
Rosy Hunt
Content Marketing Specialist
Cambridge Intelligence

 

Cambridge Intelligence Limited which is registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61. Registered Office 6-8 Hills Road, Cambridge, CB2 1JP, UK.


Updating our business info on your site

Rosy Hunt <rosy.hunt@...>
 

Hello

Can you let me know who I need to speak to about updating our info on your resource pages? We have two shiny new products which are worth mentioning, and your site currently links to a somewhat outdated post :)

Many thanks in advance for your help.

Rosy

--
Rosy Hunt
Content Marketing Specialist
Cambridge Intelligence

 

Cambridge Intelligence Limited which is registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61. Registered Office 6-8 Hills Road, Cambridge, CB2 1JP, UK.


separate elastic search for separate graph

51kumarakhil@...
 

Hi! I've setup elastic search for configurationGraphFactory and Bigtable, using below configurations

Configurations:
storage.lock.wait-time=100
storage.hbase.ext.google.bigtable.instance.id=<bigtable-id>
index.search.hostname=<host-name>
index.search.index-name=janusgraph_metadata
index.search.port=9243
index.search.elasticsearch.ssl.enabled=true
index.search.elasticsearch.http.auth.basic.password=<password>
index.search.elasticsearch.http.auth.type=basic
index.search.elasticsearch.http.auth.basic.username=<username>
storage.backend=hbase
storage.hostname=localhost
schema.default=none
storage.batch-loading=true
storage.hbase.ext.google.bigtable.project.id=<project-id>
graph.timestamps=MICRO
index.search.elasticsearch.connect-timeout=10000000
index.search.backend=elasticsearch
storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
storage.hbase.keyspace=jgex

------------------------------------------------------------------------
--------------------------------------------------------------------------------------

Create Graph
ConfiguredGraphFactory.create('graph_01')


Adding MixedIndex on property 'name'
mgmt = graph_01.openManagement();
mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.single).make();
name = mgmt.getPropertyKey("nome");
mgmt.buildIndex('byNomeUniqueMixed', Vertex.class).addKey(name, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search");
mgmt.commit();


Adding Data
graph_01_traversal.addV('person').property('name', 'Tom Cruise')


Fetching Data
graph_01_traversal.V().has("name", textPrefix("Tom"))

output: v[6753]

Query #1:  Is this the correct way to setup elastic search?


------------------------------------------------------------------------
--------------------------------------------------------------------------------------


Using above approach only, if I create a new graph
graph_02 (,say). With same configurations and mixedIndex. 
And if I add a data 

graph_02_traversal.addV('person').property('name', 'Tom Holand')

And if  I try to fetch this it

graph_02_traversal.V().has("name", textPrefix("Tom Holand"))

output:
v[6753]
v[9785]


here, I'm getting data from graph_01 as well despite using graph_02_traversal in the 'has' query

Question #2: Is there a way to setup a separate ES for a graph









JanusGraph database cache on distributed setup

washerath@...
 

In a multi node Janusgraph cluster, data modification done from one instance does not sync with others until it reaches the given expiry time (cache.db-cache-time)

As per the documentation[1] it does not recommends to enable database level cache in a distributed setup as cached data does not share amoung instances.

Any suggestions for a solution/workaround where i can see the data changes from other JG instances immediately and avoid stale data access ?


[1] https://docs.janusgraph.org/operations/cache/#cache-expiration-time


Re: dynamic graphics, limits and global index

hadoopmarc@...
 

Hi Matt,

Correct, but you should really try this out and see for yourself. Also check the janusgraph db folder after having created two graphs and see what files are created.

Per graph, but you realize that these number would require exabytes of storage?

Best wishes,   Marc


Re: hasNext() slow for large number of incoming edges

Boxuan Li
 

Created https://github.com/JanusGraph/janusgraph/issues/2966 to track the streaming feature request.


Re: Exception while creating vertex with custom vertex id

Umesh Gade
 

Thanks Marc for the pointers to check further. 
The Issue was not reproduced when we brought the setup again to 3 node cluster configuration. I will keep watching and collect more details if this issue hits again.

On Mon, Jan 31, 2022 at 2:04 PM <hadoopmarc@...> wrote:
Hi Umesh,

No, it is not clear at all whether the issue and cassandra-4.0 are related, but generally it is not useful to report bugs regarding unsupported configurations. To dig deeper, I would be curious about:
 - can you log the vertex id and label for the transactions that fail and  can you make another transaction fail with these values?
 - the stracktrace suggests that the issue is in querying the graph schema. Can you trigger the exception by manually querying the graph schema for the vertex labels involved?

Best wishes,     Marc



--
Sincerely,
Umesh Gade


Re: Exception while creating vertex with custom vertex id

hadoopmarc@...
 

Hi Umesh,

No, it is not clear at all whether the issue and cassandra-4.0 are related, but generally it is not useful to report bugs regarding unsupported configurations. To dig deeper, I would be curious about:
 - can you log the vertex id and label for the transactions that fail and  can you make another transaction fail with these values?
 - the stracktrace suggests that the issue is in querying the graph schema. Can you trigger the exception by manually querying the graph schema for the vertex labels involved?

Best wishes,     Marc


Re: Exception while creating vertex with custom vertex id

Umesh Gade
 

Hi Marc,
Yes, we are aware of the compatibility matrix. We are keeping an eye on issues with JG-0.6.0 + Cassandra-4.0. This combination has been running for the last 3-4 months without any issues for our use cases. 
Is the above issue surely to be a compatibility issue ? 

On Sun, Jan 30, 2022 at 3:00 PM <hadoopmarc@...> wrote:
Hi Umesh,

On the first line of your first post you state that you use cassandra-4.0. However, support for cassandra-4.0 is still an open issue:

https://github.com/JanusGraph/janusgraph/issues/2325
https://docs.janusgraph.org/changelog/#version-compatibility-matrix

It is confusing, indeed, that the driver version is 4.13.0, but this is correct, see:
https://github.com/JanusGraph/janusgraph/blob/v0.6.1/pom.xml

Best wishes,    Marc

On Sat, Jan 29, 2022 at 06:07 PM, Umesh Gade wrote:
set-vertex-id



--
Sincerely,
Umesh Gade


Re: Exception while creating vertex with custom vertex id

hadoopmarc@...
 

Hi Umesh,

On the first line of your first post you state that you use cassandra-4.0. However, support for cassandra-4.0 is still an open issue:

https://github.com/JanusGraph/janusgraph/issues/2325
https://docs.janusgraph.org/changelog/#version-compatibility-matrix

It is confusing, indeed, that the driver version is 4.13.0, but this is correct, see:
https://github.com/JanusGraph/janusgraph/blob/v0.6.1/pom.xml

Best wishes,    Marc


On Sat, Jan 29, 2022 at 06:07 PM, Umesh Gade wrote:
set-vertex-id

261 - 280 of 6662