Date   

Re: Janusgraph not able to find suitable index for a index enabled property key

hadoopmarc@...
 

Hi Harshit,

Can you please describe the steps you have taken in more detail:
  • creation of property keys and indices in the schema + commit,
  • creation of a vertex with a property in one committed transaction,
  • query of a vertex based on a property value in another transaction.

Marc


Janusgraph not able to find suitable index for a index enabled property key

Harshit Sharma
 

I'm working on a Janusgraph application. To improve gremlin query performance we are creating two mixed indexes, one for vertices and one for edges.
Now Janusgraph can query indexes for property keys that are created and indexed at the time of index creation i.e in the same transaction. If I'm creating and indexing a new property key in a new transaction then Janusgraph is not able to query them using indexing, instead, it does a complete graph scan.
Using Janusgeaph management API I checked that all property keys are indexed and enabled, even then Janusgraph is scanning a complete graph for querying on an indexed property key.
Backend index engine -> ElasticSearch
Backend Storage -> Cassandra

Is there anything I'm missing? Any help would be greatly appreciated.


Re: Janusgraph embedded multi instance(JVM) data sync issue

hadoopmarc@...
 

Hi Pawan,

Your requirement for instant synchronization cannot work with JanusGraph caches enabled, because JanusGraph will get data from the cache if available, instead of getting the latest data from the backend. So,

  • cache.db-cache = false
  • be sure to start a new transaction before querying for the latest data (e.g. by executing a g.tx().commit())
Best wishes,    Marc


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi Boxuan,

Please see my inline response

1. What do you mean by creating some data? For example, do you mean by creating new vertices or just updating existing vertices? If it’s the latter case, then you could try turning off cache.db-cache option as it might lead to stale data read.
[Pawan] - Creating data means vertex/edge creation and updation as well. 

2. What is your typical “duration” after which data gets reflected?  
[Pawan] - Seems to be within a 1 or two min.

3. What is your cql replication factor and read & write consistency levels? Are they default values? Also, how many Cassandra nodes do you have and are they in the same data center?
[Pawan] - These should be defaults, I am using only those graph properties which are mentioned in the below mail.  There are 8 nodes cluster(3 master + 5 nodes ). All cassandra nodes are there in the same data center.

Thanks,
Pawan

On Thu, Jan 6, 2022 at 5:39 PM Boxuan Li <liboxuan@...> wrote:
Hi Pawan,

A couple of questions:

1. What do you mean by creating some data? For example, do you mean by creating new vertices or just updating existing vertices? If it’s the latter case, then you could try turning off cache.db-cache option as it might lead to stale data read.

2. What is your typical “duration” after which data gets reflected?

3. What is your cql replication factor and read & write consistency levels? Are they default values? Also, how many Cassandra nodes do you have and are they in the same data center?

Best,
Boxuan

On Thu, Jan 6, 2022 at 3:03 PM Pawan Shriwas <shriwas.pawan@...> wrote:
Hi All,

I am facing one problem for synchronization of data stored between multiple embedded mode janusgraph instances.

If we are creating some data into graph using JVM 1 and after committing when we get same data from JVM 2 its not reflecting for some duration.

I want to avail the same information to all instances after any CRUD operation once it gets committed.

I am using the same graph property in all instances of embedded janusgraph.

##############graph.properties#####################
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cql-dns
storage.cql.keyspace=janusgraphdbks
storage.port=30808
storage.username=user123
storage.password=user12345
schema.default=none
schema.constraints=true

index.search-central-graph.backend=elasticsearch
index.search-central-graph.hostname=api-es-instance1:9200
index.search-central-graph.index-name=search-central-graph
index.search-central-graph.elasticsearch.http.auth.type=basic
index.search-central-graph.elasticsearch.http.auth.basic.username=admin
index.search-central-graph.elasticsearch.http.auth.basic.password=admin 

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
query.batch = true
query.fast-property = true
query.batch-property-prefetch = true
storage.buffer-size=1024
######################property file end############################

Please let me know if someone faces this and how to prevent this. 

Thanks,
Pawan




--
Thanks & Regard

PAWAN SHRIWAS


Re: Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Same case also happened with two or more gremlin consoles as well where we are creating/updating something on console 1 and not reflecting on others.


On Thu, Jan 6, 2022 at 12:32 PM Pawan Shriwas <shriwas.pawan@...> wrote:
Hi All,

I am facing one problem for synchronization of data stored between multiple embedded mode janusgraph instances.

If we are creating some data into graph using JVM 1 and after committing when we get same data from JVM 2 its not reflecting for some duration.

I want to avail the same information to all instances after any CRUD operation once it gets committed.

I am using the same graph property in all instances of embedded janusgraph.

##############graph.properties#####################
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cql-dns
storage.cql.keyspace=janusgraphdbks
storage.port=30808
storage.username=user123
storage.password=user12345
schema.default=none
schema.constraints=true

index.search-central-graph.backend=elasticsearch
index.search-central-graph.hostname=api-es-instance1:9200
index.search-central-graph.index-name=search-central-graph
index.search-central-graph.elasticsearch.http.auth.type=basic
index.search-central-graph.elasticsearch.http.auth.basic.username=admin
index.search-central-graph.elasticsearch.http.auth.basic.password=admin 

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
query.batch = true
query.fast-property = true
query.batch-property-prefetch = true
storage.buffer-size=1024
######################property file end############################

Please let me know if someone faces this and how to prevent this. 

Thanks,
Pawan




--
Thanks & Regard

PAWAN SHRIWAS


Re: Janusgraph embedded multi instance(JVM) data sync issue

Boxuan Li
 

Hi Pawan,

A couple of questions:

1. What do you mean by creating some data? For example, do you mean by creating new vertices or just updating existing vertices? If it’s the latter case, then you could try turning off cache.db-cache option as it might lead to stale data read.

2. What is your typical “duration” after which data gets reflected?

3. What is your cql replication factor and read & write consistency levels? Are they default values? Also, how many Cassandra nodes do you have and are they in the same data center?

Best,
Boxuan

On Thu, Jan 6, 2022 at 3:03 PM Pawan Shriwas <shriwas.pawan@...> wrote:
Hi All,

I am facing one problem for synchronization of data stored between multiple embedded mode janusgraph instances.

If we are creating some data into graph using JVM 1 and after committing when we get same data from JVM 2 its not reflecting for some duration.

I want to avail the same information to all instances after any CRUD operation once it gets committed.

I am using the same graph property in all instances of embedded janusgraph.

##############graph.properties#####################
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cql-dns
storage.cql.keyspace=janusgraphdbks
storage.port=30808
storage.username=user123
storage.password=user12345
schema.default=none
schema.constraints=true

index.search-central-graph.backend=elasticsearch
index.search-central-graph.hostname=api-es-instance1:9200
index.search-central-graph.index-name=search-central-graph
index.search-central-graph.elasticsearch.http.auth.type=basic
index.search-central-graph.elasticsearch.http.auth.basic.username=admin
index.search-central-graph.elasticsearch.http.auth.basic.password=admin 

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
query.batch = true
query.fast-property = true
query.batch-property-prefetch = true
storage.buffer-size=1024
######################property file end############################

Please let me know if someone faces this and how to prevent this. 

Thanks,
Pawan



Re: Can I use spark computer on CQL without hadoop cluster

Pawan Shriwas
 

Thanks team,

I will check it based on your response and let you know if anything is needed.


On Wed, Jan 5, 2022 at 8:24 PM <hadoopmarc@...> wrote:
It is also possible to run spark on kubernetes (in combination with distributed storage like S3 or minio):
https://spark.apache.org/docs/latest/running-on-kubernetes.html

It will require some time to get your head around this, but note that you can do this with or without the spark operator installed on your kubernetes cluster.

Marc



--
Thanks & Regard

PAWAN SHRIWAS


Janusgraph embedded multi instance(JVM) data sync issue

Pawan Shriwas
 

Hi All,

I am facing one problem for synchronization of data stored between multiple embedded mode janusgraph instances.

If we are creating some data into graph using JVM 1 and after committing when we get same data from JVM 2 its not reflecting for some duration.

I want to avail the same information to all instances after any CRUD operation once it gets committed.

I am using the same graph property in all instances of embedded janusgraph.

##############graph.properties#####################
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cql-dns
storage.cql.keyspace=janusgraphdbks
storage.port=30808
storage.username=user123
storage.password=user12345
schema.default=none
schema.constraints=true

index.search-central-graph.backend=elasticsearch
index.search-central-graph.hostname=api-es-instance1:9200
index.search-central-graph.index-name=search-central-graph
index.search-central-graph.elasticsearch.http.auth.type=basic
index.search-central-graph.elasticsearch.http.auth.basic.username=admin
index.search-central-graph.elasticsearch.http.auth.basic.password=admin 

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
query.batch = true
query.fast-property = true
query.batch-property-prefetch = true
storage.buffer-size=1024
######################property file end############################

Please let me know if someone faces this and how to prevent this. 

Thanks,
Pawan



Re: Slowing of janusgraph

hadoopmarc@...
 

Hmm, these figures seem perfectly reasonable. It seems I was also wrong about the db-cache heap region not being shared between graphs. So you have to gather more information about was is going wrong. Some ideas:

  1. does restarting nginx make a difference? Communication between JanusGraph server and gremlin clients runs over websockets. Make sure there is no issue with communication channels kept occupied.
  2. https://lists.lfaidata.foundation/g/janusgraph-users/topic/79935654#2886 This thread gives an nginx setup using a DNS virtual IP address. Possibly this makes a difference in websocket communication issues.
  3. disable the db-cache and see if this make a difference (this can only make a difference if there are many clients with transaction caches)
  4. check the logs of janusgraph server for other indications

Best wishes,      Marc


Re: Can I use spark computer on CQL without hadoop cluster

hadoopmarc@...
 

It is also possible to run spark on kubernetes (in combination with distributed storage like S3 or minio):
https://spark.apache.org/docs/latest/running-on-kubernetes.html

It will require some time to get your head around this, but note that you can do this with or without the spark operator installed on your kubernetes cluster.

Marc


Re: Can I use spark computer on CQL without hadoop cluster

Boxuan Li
 

Hi Pawan,

Do you want to run Spark traversal on a Spark standalone cluster rather than a Hadoop Yarn cluster? In that case, you could follow the JanusGraph documentation or check out this guide on Medium.

Best,
Boxuan


Can I use spark computer on CQL without hadoop cluster

Pawan Shriwas
 

Hi All,

I am checking the possibility of using a graph computer using spark on CQL backend without hadoop installation.

Please let me know if we can do this and how can i achieve this. I don't want to introduce hadoop cluster just because of this use case. I will appreciate  if anyone can share some resources around it.
  
Thanks,
Pawan


Re: Slowing of janusgraph

51kumarakhil@...
 

Hi Thanks Marc for the quick reply, below are the details you asked.

Max. Heap Size (Estimated): 3.85G

No of graphs: 7 to 8

Database-cache:
       cache.db-cache = true
       cache.db-cache-clean-wait = 20
       cache.db-cache-time = 180000
       cache.db-cache-size = 0.5
 




Re: Slowing of janusgraph

hadoopmarc@...
 

Have you tried to increase JVM memory settings for JanusGraph Server? Also, check the settings for the database cache size, because I tend to remember that each graph has its own cache. Can you provide some numbers: number of graphs, database cache settings, JVM memory settings, etc.

Best wishes,    Marc


Slowing of janusgraph

51kumarakhil@...
 
Edited

Hi, i've three janusgraph (0.5.3) servers pointing to same bigTable with same configurations mentioned below. 

Configurations:
I'm using ConfigurationManagementGraph, below is the properties file

--------------------------------<janusgraph-bigtable-configurationgraph.properties>----------------------------------

gremlin.graph=org.janusgraph.core.ConfiguredGraphFactory

storage.backend=hbase
storage.hbase.ext.google.bigtable.instance.id=
storage.hbase.ext.google.bigtable.project.id=
storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
graph.timestamps=MICRO
storage.lock.wait-time=100

graph.graphname=ConfigurationManagementGraph
storage.hostname=127.0.0.1

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

--------------------------------------------------------------------------------------------------------------------------------------------


And gremlin-server.yaml file looks like this
-------------------------------gremlin-server.yaml---------------------------------------------------------------------------------------------

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 100000
channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  graph: conf/janusgraph-inmemory.properties,
  ConfigurationManagementGraph: conf/janusgraph-bigtable-configurationgraph.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}
}}}


--------------------------------------------------------------------------------------------------------------------------------------------------------------




All the servers are sharing the same configurations. Now i've setup a nginx also on top of these servers. So, when I've to create a graph I create a connection first with nginx and nginx connects with the most available janusgraph server and that server creates a graph for me and stores it in the bigTable. 
I can now access this graph from any of the three servers.

Till now everything is working as expected.

Issue1:  Slowing of Servers

everyday I generate a new graph with around 150K vertex and 250K edges. For the first time servers generate the graph but later on it slows down the execution and at one moment it stops completely. It wont process anything it, gets stuck in between and don't take any request. So to solve this i've to restart the servers everytime. Which leads to second issue

Issue2: Deleting of graphs

Like when servers are running (all of them). In that time if a graph is created then I can also delete it. But the moment i restart any of the server then i'm not able to delete the graph, the server which get restarted continuously throws error "Table Not Found". So to resolve this too, I've to stop all the servers first and delete all the graph from bigtable then restart the servers again. But again after first graph creation by all the servers, it leads to Issue1 again.




Re: Failure on mvn clean install

hadoopmarc@...
 
Edited

Thanks for reporting. Somehow, the tests for janusgraph-examples do not pass anymore when run in a stand-alone fashion (these tests are also run in the CI from the main pom.xml in https://github.com/JanusGraph/janusgraph/blob/master/pom.xml , where obviously they do pass). I reported an issue for this. https://github.com/JanusGraph/janusgraph/issues/2911

If you just want to run the examples you can build the jar with:
mvn clean install -DskipTests
Best wishes,     Marc


Re: Potential transaction issue (JG 0.6.0)

Boxuan Li
 

I guess this has something to do with race conditions. Although I couldn't reproduce the exact issue, I found a similar race condition that caused an NPE in `expireSchemaElement` method (https://github.com/JanusGraph/janusgraph/issues/2898).  The fix is here: https://github.com/JanusGraph/janusgraph/pull/2899 which will be released in the next minor version (0.6.1). This PR also includes the temporary fix proposed by Sergey:

public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}

If anyone is able to find a steady way to reproduce this issue and/or encounter a similar NPE issue somewhere else, please let me know, thanks! A code review is also very welcome.

Best,
Boxuan


Failure on mvn clean install

benanavd@...
 

Just unpacked janusgraph-full-0.6.0.zip and got the server started with ./bin/janusgraph-server.sh console. It started up fine. Then I cd to the examples directory and to a mvn clean install. I get the following errors
[ERROR] Failures: 
[ERROR]   GraphAppTest.openGraphConfigNotFound:70 Unexpected exception type thrown ==> expected: <java.io.FileNotFoundException> but was: <org.apache.commons.configuration2.ex.ConfigurationException>
[ERROR]   GraphAppTest.openGraphNullConfig:65 Unexpected exception type thrown ==> expected: <java.lang.NullPointerException> but was: <java.lang.RuntimeException>
[INFO] 
[ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for JanusGraph-Examples: Examples for JanusGraph 0.6.0:
[INFO] 
[INFO] JanusGraph-Examples: Examples for JanusGraph ....... SUCCESS [  2.603 s]
[INFO] Example-Common: Common Graph Code for Examples ..... FAILURE [  6.387 s]
[INFO] Example-BerkeleyJE: BerkeleyJE Storage, Lucene Index SKIPPED
[INFO] Example-Cql: Cassandra CQL Storage, Elasticsearch Index SKIPPED
[INFO] Example-HBase: HBase Storage, Solr Index ........... SKIPPED
[INFO] Example-RemoteGraph: Example with RemoteGraph ...... SKIPPED
[INFO] Example-TinkerGraph: Example with TinkerGraph ...... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] -----------------------------------------------
 
Any ideas how to fix that?
 
I've been trying to setup JanusGraph all day and can't seem to figure it out!


Re: Parameterized bulk insert (addV) script in gremlin-python

Scott Friedman
 

Wow, works like a charm using gremlin-python, and I don't even have to use a script!

Thanks for the quick wisdom!

SF


Re: Parameterized bulk insert (addV) script in gremlin-python

hadoopmarc@...
 

Hi Scott,

You can try to use this thread for inspiration:
https://groups.google.com/g/gremlin-users/c/HtBRwaU0pnQ/m/duFs5-imBAAJ

2 1/2 years ago I was impressed by this solution! This really iterates over the input data and add multiple vertices.

Best wishes,   Marc

281 - 300 of 6588