Date   

Slowing of janusgraph

51kumarakhil@...
 
Edited

Hi, i've three janusgraph (0.5.3) servers pointing to same bigTable with same configurations mentioned below. 

Configurations:
I'm using ConfigurationManagementGraph, below is the properties file

--------------------------------<janusgraph-bigtable-configurationgraph.properties>----------------------------------

gremlin.graph=org.janusgraph.core.ConfiguredGraphFactory

storage.backend=hbase
storage.hbase.ext.google.bigtable.instance.id=
storage.hbase.ext.google.bigtable.project.id=
storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
graph.timestamps=MICRO
storage.lock.wait-time=100

graph.graphname=ConfigurationManagementGraph
storage.hostname=127.0.0.1

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

--------------------------------------------------------------------------------------------------------------------------------------------


And gremlin-server.yaml file looks like this
-------------------------------gremlin-server.yaml---------------------------------------------------------------------------------------------

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 100000
channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  graph: conf/janusgraph-inmemory.properties,
  ConfigurationManagementGraph: conf/janusgraph-bigtable-configurationgraph.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}
}}}


--------------------------------------------------------------------------------------------------------------------------------------------------------------




All the servers are sharing the same configurations. Now i've setup a nginx also on top of these servers. So, when I've to create a graph I create a connection first with nginx and nginx connects with the most available janusgraph server and that server creates a graph for me and stores it in the bigTable. 
I can now access this graph from any of the three servers.

Till now everything is working as expected.

Issue1:  Slowing of Servers

everyday I generate a new graph with around 150K vertex and 250K edges. For the first time servers generate the graph but later on it slows down the execution and at one moment it stops completely. It wont process anything it, gets stuck in between and don't take any request. So to solve this i've to restart the servers everytime. Which leads to second issue

Issue2: Deleting of graphs

Like when servers are running (all of them). In that time if a graph is created then I can also delete it. But the moment i restart any of the server then i'm not able to delete the graph, the server which get restarted continuously throws error "Table Not Found". So to resolve this too, I've to stop all the servers first and delete all the graph from bigtable then restart the servers again. But again after first graph creation by all the servers, it leads to Issue1 again.




Re: Failure on mvn clean install

hadoopmarc@...
 
Edited

Thanks for reporting. Somehow, the tests for janusgraph-examples do not pass anymore when run in a stand-alone fashion (these tests are also run in the CI from the main pom.xml in https://github.com/JanusGraph/janusgraph/blob/master/pom.xml , where obviously they do pass). I reported an issue for this. https://github.com/JanusGraph/janusgraph/issues/2911

If you just want to run the examples you can build the jar with:
mvn clean install -DskipTests
Best wishes,     Marc


Re: Potential transaction issue (JG 0.6.0)

Boxuan Li
 

I guess this has something to do with race conditions. Although I couldn't reproduce the exact issue, I found a similar race condition that caused an NPE in `expireSchemaElement` method (https://github.com/JanusGraph/janusgraph/issues/2898).  The fix is here: https://github.com/JanusGraph/janusgraph/pull/2899 which will be released in the next minor version (0.6.1). This PR also includes the temporary fix proposed by Sergey:

public InternalVertex getInternalVertex(long vertexId) {
// TODO temporary fix
if (isClosed()) {
return null;
}
//return vertex but potentially check for existence
return vertexCache.get(vertexId, internalVertexRetriever);
}

If anyone is able to find a steady way to reproduce this issue and/or encounter a similar NPE issue somewhere else, please let me know, thanks! A code review is also very welcome.

Best,
Boxuan


Failure on mvn clean install

benanavd@...
 

Just unpacked janusgraph-full-0.6.0.zip and got the server started with ./bin/janusgraph-server.sh console. It started up fine. Then I cd to the examples directory and to a mvn clean install. I get the following errors
[ERROR] Failures: 
[ERROR]   GraphAppTest.openGraphConfigNotFound:70 Unexpected exception type thrown ==> expected: <java.io.FileNotFoundException> but was: <org.apache.commons.configuration2.ex.ConfigurationException>
[ERROR]   GraphAppTest.openGraphNullConfig:65 Unexpected exception type thrown ==> expected: <java.lang.NullPointerException> but was: <java.lang.RuntimeException>
[INFO] 
[ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for JanusGraph-Examples: Examples for JanusGraph 0.6.0:
[INFO] 
[INFO] JanusGraph-Examples: Examples for JanusGraph ....... SUCCESS [  2.603 s]
[INFO] Example-Common: Common Graph Code for Examples ..... FAILURE [  6.387 s]
[INFO] Example-BerkeleyJE: BerkeleyJE Storage, Lucene Index SKIPPED
[INFO] Example-Cql: Cassandra CQL Storage, Elasticsearch Index SKIPPED
[INFO] Example-HBase: HBase Storage, Solr Index ........... SKIPPED
[INFO] Example-RemoteGraph: Example with RemoteGraph ...... SKIPPED
[INFO] Example-TinkerGraph: Example with TinkerGraph ...... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] -----------------------------------------------
 
Any ideas how to fix that?
 
I've been trying to setup JanusGraph all day and can't seem to figure it out!


Re: Parameterized bulk insert (addV) script in gremlin-python

Scott Friedman
 

Wow, works like a charm using gremlin-python, and I don't even have to use a script!

Thanks for the quick wisdom!

SF


Re: Parameterized bulk insert (addV) script in gremlin-python

hadoopmarc@...
 

Hi Scott,

You can try to use this thread for inspiration:
https://groups.google.com/g/gremlin-users/c/HtBRwaU0pnQ/m/duFs5-imBAAJ

2 1/2 years ago I was impressed by this solution! This really iterates over the input data and add multiple vertices.

Best wishes,   Marc


Parameterized bulk insert (addV) script in gremlin-python

Scott Friedman
 

Good afternoon,

I'm attempting to use gremlin-python to do bulk vertex or edge inserts, and I'd figured I could use params to send in a simple script.  A simple proof of concept would be:

cmd = 'g.addV().property("name", values)'
params = { 'values': ['name1', 'name2'] }
result_set = conn._client.submit(cmd, params)

...but when I execute that, I get a single vertex added with "[name1, name2]" as its name.  I suppose this makes sense.  And is there a way to issue a compact loop-based script over an arbitrary list in my parameters?

I could always forego the script-based approach and use the python API to make a massive query of repeated addV() calls (which is my present implementation), but I'd hoped that a parameter-based, script-based solution would be more efficient (and elegant).

Suggestions are very welcome!

Regards,
Scott


Re: JG as a 3store, rdf support

Matthew Nguyen
 

Thanks Marc, hadn't seen ERGS but looks interesting and will take a look.  


Re: Using a user-supplied string as vertex ID

Scott Friedman
 

Thanks, Boxuan!  Looks like a great discussion in that github issue; I hope something eventually comes of it!


Re: Python output to mgmt queries

dimi
 

Hi Marc, 
Thank you for your reply. I would like to access this information only from the schema (my graph is empty now). Your second solution could work but it only prints the result. So to get the labels, I would need to extract them with some regex. 
However, I think I have found a solution.
It seems that python converts the object org.janusgraph.graphdb.types.VertexLabelVertex to gremlin_python.structure.graph.Vertex. I do not know if this is wanted or accidental  (for janusgraph-0.6.x with gremlin_python-3.5.1).
However, I can get a list of labels if I convert the labels to string before requesting the result to Python. 
For example
from gremlin_python.driver.client import Client
client = Client('ws://localhost:8182/gremlin', 'mygraph')
mgmt = "mygraph.openManagement()"
get_v_labels = mgmt + ".getVertexLabels().collect {a -> a.name()}"
client.submit(get_v_labels).all().result()

In this way, I can also get properties etc.

Thanks again and best wishes, 
Dimi


Re: JG as a 3store, rdf support

hadoopmarc@...
 

Hi Matthew,

Not an answer to your questions, but a few remarks that might help anyway:
  • while a single client has its limits in adding vertices and edges, people use distributed computing frameworks such as Apache Spark and the like, to increase overall ingestion rates
I could not help doing a singe Google search request myself and hit upon:
https://github.com/IBM/expressive-reasoning-graph-store
which seems pretty recent, though immature.

Best wishes,   Marc


Re: Python output to mgmt queries

hadoopmarc@...
 

I am not sure what you are up to and the API changes in remote connections may have confused you.

If you want to see the labels of all vertices in the graph (for janusgraph-0.6.x with gremlin_python-3.5.1):
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
g.V().label().toList
If you want to see the vertex labels that you defined in the JanusgGraph schema:
from gremlin_python.driver.client import Client
client = Client('ws://localhost:8182/gremlin', 'graph')
for line in client.submit("graph.openManagement().printSchema()").next():
    print(line)
Best wishes,    Marc




Re: Using a user-supplied string as vertex ID

Boxuan Li
 

Hi Scott,

Currently, JanusGraph does not support user-specified string identifiers. You could check out https://github.com/JanusGraph/janusgraph/issues/1221 to see discussions on this topic.

Best,
Boxuan


Re: Important | Queries for edge label connections

Boxuan Li
 
Edited

Hi Pawan,

Regarding your first question, try this in Java:

mgmt.getEdgeLabel("belongsTo").mappedConnections()

which should give you a list of Java objects that contain the outgoing and incoming labels for each connection.

In the Gremlin console, you could do this:

mgmt.getEdgeLabel("knows").mappedConnections()[0].incomingVertexLabel
mgmt.getEdgeLabel("knows").mappedConnections()[0].outgoingVertexLabel
which is a bit hacky but hopefully shall work. Feel free to create a feature request on GitHub and link to this thread.

Regarding your second question, IIRC there is no such API available.

Best regards,
Boxuan


Using a user-supplied string as vertex ID

Scott Friedman
 

Greetings,

I'd like to specify unique string IDs for newly-added vertices in JanusGraph.  I've verified that I can set graph.set-vertex-id to True and then add integer IDs via my (python) client as expected.

Does JanusGraph support user-specified string identifiers in any fashion?  If not, is there a recommended way to map into integers (e.g., a potentially lengthy MD5 hash?) or will such a long number damage JanusGraph's ID indexing?

Thanks much for your time!

Scott


JG as a 3store, rdf support

Matthew Nguyen
 

Hey folks, been playing with JG the last couple weeks and am able to import a few million triples using rdf2g (cassandra/solr backend).  I'm processing around 1000 triples/sec currently after turning on batch-loading and disabling a few pre-conditions :-). While this may be suitable for loading a few million triples, it will take far too long to load a billion+.  I've also gotten sparql-gremlin working but haven't yet run it through its paces though I'm disheartened to see that the project appears to have been abandoned. 

I'm looking to communicate with others interested in trying to use JG as a 3store given the lack of available enterprise capable 3store opensource projects currently available.  After some searching on here, there appears to have been some bits & pieces of conversations from various people through the years re: RDF processing.  

Has anyone on here made any significant strides with rdf & JG and can share their experiences?  And if there's a better place to discuss this topic, please advise.

thx, matt


Important | Queries for edge label connections

Pawan Shriwas
 

Hi All,

I need a solution for these two things, but I tried but was not able to find the solution.

1. I want to list the edgeLabel connection created in janusgraph 
       mgmt.addConnection(“belongsTo”, vertexLabel1, vertexLabel2);

       mgmt.addConnection("belongsTo", vertexLabel1, vertexLabel3);

       mgmt.addConnection("belongsTo", vertexLabel3, vertexLabel4);

       mgmt.addConnection("belongsTo", vertexLabel5, vertexLabel6);


    Can see only edge labels in printSchema but not how many time it used between Vertex labels. which is created after above steps.


2.  I want to update the direction of one connection of edgeLabel.


        current ->     mgmt.addConnection(“belongsTo”, vertexLabel1vertexLabel2);   //outdirection towards the vertexLabel2

       

       I want to update the direction of this created connection. like below 

       Expected Direction -->       mgmt.addConnection(“belongsTo”, vertexLabel2 ,vertexLabel1);     // I want only one kind of direction to exist between these 2 nodes types. If this option creates another connection then I want the previous direction to be removed.



Please review and let me know how I can achieve this.  Thanks in advance.



Thanks,

Pawan 

    

 

     




Python output to mgmt queries

dimi
 
Edited

Hi! I am trying to parse some basic info from the schema via Python but I am probably doing something wrong. 

I can request the management info with the Client object in python:

from gremlin_python.driver.client import Client
client = Client('ws://localhost:8182/gremlin', 'mygraph')
mgmt = "mygraph.openManagement()"
get_v_labels = mgmt + ".getVertexLabels()"
tt = client.submit(get_v_labels).all().result()

I have 16 labels, and if I run it in the gremlin console I obtain a list of labels. In python, instead, 
I get
[v[74253], v[74765], v[75277], v[75789], v[76301], v[76813], v[77325], v[77837], v[78349], v[78861], v[79373], v[79885], v[80397], v[80909], v[81421], v[81933]]

If I do
for t in tt:
    for p in g.V(t.id).properties():
         print("key:",p.label, "| value: " ,p.value)
I do not get any output. How can I get the list of labels from the schema?


Re: high-scale-lib dependency

sergeymetallic@...
 

Hm, in Janusgraph version 0.6.0 there is a different library used https://github.com/datastax/java-driver , is there any point to have the dependency on apache cassandra?


Re: high-scale-lib dependency

Clement de Groc
 

Hey! Just wanted to report that we had a similar issue with high-scale-lib.
Replacing high-scale-lib with JCTools sounds like a good option, but I'm not sure it will work for all modules: if I'm not mistaken, Cassandra relies on `high-scale-lib` too.
Another solution could be to exclude all classes under `java/util` from JanusGraph uber-jars.

261 - 280 of 6554