Re: olap connection with spark standalone cluster
Hi Lilly,
This error says that are somehow two versions of the TinkerPop jars in your project. If you use maven you check this with the dependency plugin.
If other problems appear, also be sure that the spark cluster is doing fine by running one of the examples from the spark distribution with spark-submit.
HTH, Marc
Op dinsdag 15 oktober 2019 09:38:08 UTC+2 schreef Lilly:
toggle quoted messageShow quoted text
Hi everyone,
I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark:// 127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh. I checked that spark works with the provided example programs.
I am further using the janusgraph-0.4.0-hadoop2 binary.
Now I configured the read-cassandra-3.properties as follows: gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output gremlin.spark.persistContext=true janusgraphmr.ioformat.conf. storage.backend=cassandra janusgraphmr.ioformat.conf. storage.hostname=127.0.0.1 janusgraphmr.ioformat.conf. storage.port=9160 janusgraphmr.ioformat.conf. storage.cassandra.keyspace= janusgraph cassandra.input.partitioner. class=org.apache.cassandra. dht.Murmur3Partitioner spark.master=spark:// 127.0.0.1:7077spark.executor.memory=8g spark.executor.extraClassPath= /home/janusgraph-0.4.0- hadoop2/lib/* spark.serializer=org.apache. spark.serializer. KryoSerializer spark.kryo.registrator=org. apache.tinkerpop.gremlin. spark.structure.io.gryo. GryoRegistrator where the janusgraph libraries are stored in /home/janusgraph-0.4.0- hadoop2/lib/*
In my java application I now tried Graph graph = GraphFactory.open('...') GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class); and then g.V().count().next() I get the error message: ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267
Any ideas as to what might be the problem? Thanks! Lilly
|
|
How roll back works in janus graph, will it roll back the storage write in one transaction
Hi, for below sample code. storage backend is Hbase, "name" is used as index, it will at least has two rows update: but what if index update succeed while vertex update failed(throw exception). when we call rollback, Will it roll back the index write to storage?
try {
user = graph.addVertex() user.property("name", name) graph.tx().commit() } catch (Exception e) { //Recover, retry, or return error message println(e.getMessage()) graph.tx().rollback() // <------- Added line }
|
|
olap connection with spark standalone cluster
Hi everyone,
I downloaded a fresh spark binary relaese (spark-2.4.0-hadoop2.7) and set the master to spark://127.0.0.1:7077. I then started all services via $SPARK_HOME/sbin/start-all.sh. I checked that spark works with the provided example programs.
I am further using the janusgraph-0.4.0-hadoop2 binary.
Now I configured the read-cassandra-3.properties as follows: gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output gremlin.spark.persistContext=true janusgraphmr.ioformat.conf.storage.backend=cassandra janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1 janusgraphmr.ioformat.conf.storage.port=9160 janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner spark.master=spark://127.0.0.1:7077 spark.executor.memory=8g spark.executor.extraClassPath=/home/janusgraph-0.4.0-hadoop2/lib/* spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator
where the janusgraph libraries are stored in /home/janusgraph-0.4.0-hadoop2/lib/*
In my java application I now tried Graph graph = GraphFactory.open('...') GraphTraversalSource g = graph.traversal().withComputer(SparkGraphComputer.class); and then g.V().count().next() I get the error message: ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job Exception in thread "main" java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 15, 192.168.178.32, executor 0): java.io.InvalidClassException: org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal; local class incompatible: stream classdesc serialVersionUID = -3191185630641472442, local class serialVersionUID = 6523257080464450267
Any ideas as to what might be the problem? Thanks! Lilly
|
|
New committer: Dmitry Kovalev
"Florian Hockmann" <f...@...>
On behalf of the JanusGraph Technical Steering Committee (TSC), I'm pleased to welcome a new committer on the project!
Dmitry Kovalev made a major contribution with the production-ready in-memory backend. He is quite responsive and patient during the review process and he also contributed to development decisions. Congratulations, Dmitry!
|
|
Re: [QUESTION] Usage of the cassandraembedded
I now experimented with many types of settings now for the cql connection and timed how long it took. My observation is the following: -Embedded with bulk loading took 16 min
- CQL without bulk loading is extremly slow > 2h
- CQL with bulk loading (same settings as for embedded for parameters: storage.batch.loading, ids.block.size, ids.renew.timeout, cache.db-cache, cache.db-cache-clean-wait, cache.db-cache-time, cache.db-cache-size) took 27 min and took up considerable amounts of my RAM (not the case for embedded mode). - CQL as above but with additionally storage.cql.batch-statement-size = 500 and storage.batch-loading = true took 24 min and not quite as much RAM.
I honestly do not now what else might be the issue..
Am Mittwoch, 9. Oktober 2019 08:17:13 UTC+2 schrieb fa...@...:
toggle quoted messageShow quoted text
For "violation of unique key" it could be the case that cql checks id's to be unique (JanusGraph could run out of id's in the batch loading mode) but i'm not sure what the embedded backend is doing.
Am Dienstag, 8. Oktober 2019 17:50:23 UTC+2 schrieb Lilly: Hi Jan,
So I tried it again. First of all, I remembered, that for cql I need to commit after each step. Otherwise, I get "violation of unique key" errors, even though I am actually not. Is this supposed to be the case (having to commit each time)? Now on doing the commit after each function call, I found that with the adaption in the properties configuration (see last reply) it is really super slow. If I use the "default" configuration for cql, it is a bit faster but still much slower than in the embedded case. I also tried it with another graph which I persisted like this:
public void persist(Map<Integer, Map<String,Object>> nodes, Map<Integer,Integer> edges, Map<Integer,Map<String,String>> names) { g = graph.traversal();
int counter = 0; for(Map.Entry<Integer, Map<String,Object>> e: nodes.entrySet()) {
Vertex v = g.addV().property("taxId",e.getKey()). property("rank",e.getValue().get("rank")). property("divId",e.getValue().get("divId")). property("genId",e.getValue().get("genId")).next(); g.tx().commit(); Map<String,String> n = names.get(e.getKey()); if(n != null) { for(Map.Entry<String,String> vals: n.entrySet()) { g.V(v).property(vals.getKey(),vals.getValue()).iterate(); g.tx().commit(); } }
if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++;
}
counter = 0; for(Map.Entry<Integer,Integer> e: edges.entrySet()) { g.V().has("taxId",e.getKey()).as("v1").V(). has("taxId",e.getValue()).as("v2"). addE("has_parent").from("v1").to("v2").iterate(); g.tx().commit(); if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++; }
g.V().has("taxId",1).as("v").outE().filter(__.inV().where(P.eq("v"))).drop().iterate(); g.tx().commit(); System.out.println("Done with persistence"); }
And had the same problem in either case.
I am probably using the cql backend wrong somehow and would appreciate any help on what else to do! Thanks, Lilly
Am Dienstag, 8. Oktober 2019 09:05:56 UTC+2 schrieb Lilly: Hi Jan, Ok then I probably screwed up somewhere. I kind of thought this was to be expected, which is why I did not check it more thoroughly.
Maybe the way I persisted is not working well for cql. I will try to create a test scenario where I do not have to persist all my data and see how it performs with cql again.
In principle, what I do is call this function : public void updateEdges(String kmer, int pos, boolean strand, int record, List<SequenceParser.Feature> features){
if(features == null) { features = Arrays.asList(); }
g.withSideEffect("features",features) .V().has("prefix", kmer.substring(0,kmer.length()-1)).fold().coalesce(__.unfold(), __.addV("prefix_node").property("prefix",kmer.substring(0,kmer.length()-1)) ).as("v1"). coalesce(__.V().has("prefix", kmer.substring(1,kmer.length())), __.addV("prefix_node").property("prefix",kmer.substring(1,kmer.length())) ).as("v2"). sideEffect(__.choose(__.select("features").unfold().count().is(P.eq(0)), __.addE("suffix_edge").property("record",record). property("strand",strand).property("pos",pos).from("v1").to("v2")). select("features").unfold(). addE("suffix_edge").property("record",record).property("strand",strand).property("pos",pos) .property(__.map(t -> ((SequenceParser.Feature)t.get()).category), __.map(t -> ((SequenceParser.Feature)t.get()).feature)).from("v1").to("v2")). iterate();
} and every roughly 50000 calls I do a commit. As a side remark, all of the above properties possess indecees. And Feature is a simple class with two attributes category and feature.
Also I adapted the configuration file in the following way:
storage.batch-loading = true
ids.block-size = 100000 ids.authority.wait-time = 2000 ms ids.renew-timeout = 1000000 ms
I tried the same with cql and embedded.
I will get back to you once I have tested it once again. But maybe you already spot an issue? Thanks
Lilly
Am Montag, 7. Oktober 2019 20:14:29 UTC+2 schrieb fa...@...: We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|
|
Re: index not used for query
Anatoly Belikov <awbe...@...>
index.search.backend=elasticsearch index.search.hostname=127.0.0.1 index.search.elasticsearch.client-only=true
Do you think it is due to elastic search?
toggle quoted messageShow quoted text
On Wednesday, 2 October 2019 14:06:01 UTC+3, arnab kumar pan wrote: facing same issue while creating mixed index, can you share your elasticsearch configuration? On Tuesday, September 24, 2019 at 7:26:43 PM UTC+5:30, aw...@... wrote: Hello I have made an index for vertex property "id", the index is enabled, but still it is not used for the query according to the profiler. Please, give me advice on how to make index work. gremlin> vindex = mgmt.getGraphIndex("byId") gremlin> vindex.fieldKeys ==>id
mgmt.awaitGraphIndexStatus(graph, vindex.name()).status(SchemaStatus.ENABLED).call() ==>GraphIndexStatusReport[success=true, indexName='byId', targetStatus=[ENABLED], notConverged={}, converged={id=ENABLED}, elapsed=PT0.001S]
gremlin> g.V().has('id', '-9032656531829342390').profile() ==>Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= JanusGraphStep([],[id.eq(-9032656531829342390)]) 1 1 2230.851 100.00 \_condition=(id = -9032656531829342390) \_isFitted=false \_query=[] \_orders=[] \_isOrdered=true optimization 0.005 optimization 0.026 scan 0.000 \_condition=VERTEX \_query=[] \_fullscan=true >TOTAL - - 2230.851
|
|
Re: [QUESTION] Usage of the cassandraembedded
For "violation of unique key" it could be the case that cql checks id's to be unique (JanusGraph could run out of id's in the batch loading mode) but i'm not sure what the embedded backend is doing.
I never used the batch loading mode, see also here: https://docs.janusgraph.org/advanced-topics/bulk-loading/.
Am Dienstag, 8. Oktober 2019 17:50:23 UTC+2 schrieb Lilly:
toggle quoted messageShow quoted text
Hi Jan,
So I tried it again. First of all, I remembered, that for cql I need to commit after each step. Otherwise, I get "violation of unique key" errors, even though I am actually not. Is this supposed to be the case (having to commit each time)? Now on doing the commit after each function call, I found that with the adaption in the properties configuration (see last reply) it is really super slow. If I use the "default" configuration for cql, it is a bit faster but still much slower than in the embedded case. I also tried it with another graph which I persisted like this:
public void persist(Map<Integer, Map<String,Object>> nodes, Map<Integer,Integer> edges, Map<Integer,Map<String,String>> names) { g = graph.traversal();
int counter = 0; for(Map.Entry<Integer, Map<String,Object>> e: nodes.entrySet()) {
Vertex v = g.addV().property("taxId",e.getKey()). property("rank",e.getValue().get("rank")). property("divId",e.getValue().get("divId")). property("genId",e.getValue().get("genId")).next(); g.tx().commit(); Map<String,String> n = names.get(e.getKey()); if(n != null) { for(Map.Entry<String,String> vals: n.entrySet()) { g.V(v).property(vals.getKey(),vals.getValue()).iterate(); g.tx().commit(); } }
if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++;
}
counter = 0; for(Map.Entry<Integer,Integer> e: edges.entrySet()) { g.V().has("taxId",e.getKey()).as("v1").V(). has("taxId",e.getValue()).as("v2"). addE("has_parent").from("v1").to("v2").iterate(); g.tx().commit(); if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++; }
g.V().has("taxId",1).as("v").outE().filter(__.inV().where(P.eq("v"))).drop().iterate(); g.tx().commit(); System.out.println("Done with persistence"); }
And had the same problem in either case.
I am probably using the cql backend wrong somehow and would appreciate any help on what else to do! Thanks, Lilly
Am Dienstag, 8. Oktober 2019 09:05:56 UTC+2 schrieb Lilly: Hi Jan, Ok then I probably screwed up somewhere. I kind of thought this was to be expected, which is why I did not check it more thoroughly.
Maybe the way I persisted is not working well for cql. I will try to create a test scenario where I do not have to persist all my data and see how it performs with cql again.
In principle, what I do is call this function : public void updateEdges(String kmer, int pos, boolean strand, int record, List<SequenceParser.Feature> features){
if(features == null) { features = Arrays.asList(); }
g.withSideEffect("features",features) .V().has("prefix", kmer.substring(0,kmer.length()-1)).fold().coalesce(__.unfold(), __.addV("prefix_node").property("prefix",kmer.substring(0,kmer.length()-1)) ).as("v1"). coalesce(__.V().has("prefix", kmer.substring(1,kmer.length())), __.addV("prefix_node").property("prefix",kmer.substring(1,kmer.length())) ).as("v2"). sideEffect(__.choose(__.select("features").unfold().count().is(P.eq(0)), __.addE("suffix_edge").property("record",record). property("strand",strand).property("pos",pos).from("v1").to("v2")). select("features").unfold(). addE("suffix_edge").property("record",record).property("strand",strand).property("pos",pos) .property(__.map(t -> ((SequenceParser.Feature)t.get()).category), __.map(t -> ((SequenceParser.Feature)t.get()).feature)).from("v1").to("v2")). iterate();
} and every roughly 50000 calls I do a commit. As a side remark, all of the above properties possess indecees. And Feature is a simple class with two attributes category and feature.
Also I adapted the configuration file in the following way:
storage.batch-loading = true
ids.block-size = 100000 ids.authority.wait-time = 2000 ms ids.renew-timeout = 1000000 ms
I tried the same with cql and embedded.
I will get back to you once I have tested it once again. But maybe you already spot an issue? Thanks
Lilly
Am Montag, 7. Oktober 2019 20:14:29 UTC+2 schrieb fa...@...: We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|
|
Re: [QUESTION] Usage of the cassandraembedded
Your block-size should be large in this example, see Id Creation: https://www.experoinc.com/post/janusgraph-nuts-and-bolts-part-1-write-performance
Am Dienstag, 8. Oktober 2019 09:05:56 UTC+2 schrieb Lilly:
toggle quoted messageShow quoted text
Hi Jan, Ok then I probably screwed up somewhere. I kind of thought this was to be expected, which is why I did not check it more thoroughly.
Maybe the way I persisted is not working well for cql. I will try to create a test scenario where I do not have to persist all my data and see how it performs with cql again.
In principle, what I do is call this function : public void updateEdges(String kmer, int pos, boolean strand, int record, List<SequenceParser.Feature> features){
if(features == null) { features = Arrays.asList(); }
g.withSideEffect("features",features) .V().has("prefix", kmer.substring(0,kmer.length()-1)).fold().coalesce(__.unfold(), __.addV("prefix_node").property("prefix",kmer.substring(0,kmer.length()-1)) ).as("v1"). coalesce(__.V().has("prefix", kmer.substring(1,kmer.length())), __.addV("prefix_node").property("prefix",kmer.substring(1,kmer.length())) ).as("v2"). sideEffect(__.choose(__.select("features").unfold().count().is(P.eq(0)), __.addE("suffix_edge").property("record",record). property("strand",strand).property("pos",pos).from("v1").to("v2")). select("features").unfold(). addE("suffix_edge").property("record",record).property("strand",strand).property("pos",pos) .property(__.map(t -> ((SequenceParser.Feature)t.get()).category), __.map(t -> ((SequenceParser.Feature)t.get()).feature)).from("v1").to("v2")). iterate();
} and every roughly 50000 calls I do a commit. As a side remark, all of the above properties possess indecees. And Feature is a simple class with two attributes category and feature.
Also I adapted the configuration file in the following way:
storage.batch-loading = true
ids.block-size = 100000 ids.authority.wait-time = 2000 ms ids.renew-timeout = 1000000 ms
I tried the same with cql and embedded.
I will get back to you once I have tested it once again. But maybe you already spot an issue? Thanks
Lilly
Am Montag, 7. Oktober 2019 20:14:29 UTC+2 schrieb fa...@...: We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|
|
JanusGraph sessions at Scylla Summit
Peter Corless <pe...@...>
Hello everyone! Though I generally just lurk and absorb everyone's collective wisdom, today I wanted to let you know we'll have a pair of JanusGraph practitioners speaking at Scylla Summit this year, November 5-6 in San Francisco: - Brian Hall of Expero
- Ryan Stauffer of Enharmonic
We published a blog today regarding their upcoming talks: JanusGraph has been a perennial topic at Scylla Summit since 2016, so I cannot be more pleased to continue the tradition of showcasing its capabilities and use cases with our audience.
Further forgive me for sounding all-too-marketing-y, but if anyone on the list would be interested in attending these sessions at Scylla Summit, feel free to use the discount code JANUSGRAPHUSERS25 for 25% off.
With that, I'll let you get back to the heart of your technical discussions. Enjoy the day!
-Peter.
-- Peter Corless Technical Marketing Manager 650-906-3134
|
|
Re: [QUESTION] Usage of the cassandraembedded
Hi Jan,
So I tried it again. First of all, I remembered, that for cql I need to commit after each step. Otherwise, I get "violation of unique key" errors, even though I am actually not. Is this supposed to be the case (having to commit each time)? Now on doing the commit after each function call, I found that with the adaption in the properties configuration (see last reply) it is really super slow. If I use the "default" configuration for cql, it is a bit faster but still much slower than in the embedded case. I also tried it with another graph which I persisted like this:
public void persist(Map<Integer, Map<String,Object>> nodes, Map<Integer,Integer> edges, Map<Integer,Map<String,String>> names) { g = graph.traversal();
int counter = 0; for(Map.Entry<Integer, Map<String,Object>> e: nodes.entrySet()) {
Vertex v = g.addV().property("taxId",e.getKey()). property("rank",e.getValue().get("rank")). property("divId",e.getValue().get("divId")). property("genId",e.getValue().get("genId")).next(); g.tx().commit(); Map<String,String> n = names.get(e.getKey()); if(n != null) { for(Map.Entry<String,String> vals: n.entrySet()) { g.V(v).property(vals.getKey(),vals.getValue()).iterate(); g.tx().commit(); } }
if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++;
}
counter = 0; for(Map.Entry<Integer,Integer> e: edges.entrySet()) { g.V().has("taxId",e.getKey()).as("v1").V(). has("taxId",e.getValue()).as("v2"). addE("has_parent").from("v1").to("v2").iterate(); g.tx().commit(); if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++; }
g.V().has("taxId",1).as("v").outE().filter(__.inV().where(P.eq("v"))).drop().iterate(); g.tx().commit(); System.out.println("Done with persistence"); }
And had the same problem in either case.
I am probably using the cql backend wrong somehow and would appreciate any help on what else to do! Thanks, Lilly
Am Dienstag, 8. Oktober 2019 09:05:56 UTC+2 schrieb Lilly:
toggle quoted messageShow quoted text
Hi Jan, Ok then I probably screwed up somewhere. I kind of thought this was to be expected, which is why I did not check it more thoroughly.
Maybe the way I persisted is not working well for cql. I will try to create a test scenario where I do not have to persist all my data and see how it performs with cql again.
In principle, what I do is call this function : public void updateEdges(String kmer, int pos, boolean strand, int record, List<SequenceParser.Feature> features){
if(features == null) { features = Arrays.asList(); }
g.withSideEffect("features",features) .V().has("prefix", kmer.substring(0,kmer.length()-1)).fold().coalesce(__.unfold(), __.addV("prefix_node").property("prefix",kmer.substring(0,kmer.length()-1)) ).as("v1"). coalesce(__.V().has("prefix", kmer.substring(1,kmer.length())), __.addV("prefix_node").property("prefix",kmer.substring(1,kmer.length())) ).as("v2"). sideEffect(__.choose(__.select("features").unfold().count().is(P.eq(0)), __.addE("suffix_edge").property("record",record). property("strand",strand).property("pos",pos).from("v1").to("v2")). select("features").unfold(). addE("suffix_edge").property("record",record).property("strand",strand).property("pos",pos) .property(__.map(t -> ((SequenceParser.Feature)t.get()).category), __.map(t -> ((SequenceParser.Feature)t.get()).feature)).from("v1").to("v2")). iterate();
} and every roughly 50000 calls I do a commit. As a side remark, all of the above properties possess indecees. And Feature is a simple class with two attributes category and feature.
Also I adapted the configuration file in the following way:
storage.batch-loading = true
ids.block-size = 100000 ids.authority.wait-time = 2000 ms ids.renew-timeout = 1000000 ms
I tried the same with cql and embedded.
I will get back to you once I have tested it once again. But maybe you already spot an issue? Thanks
Lilly
Am Montag, 7. Oktober 2019 20:14:29 UTC+2 schrieb fa...@...: We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|
|
Re: [QUESTION] Usage of the cassandraembedded
Hi Jan, Ok then I probably screwed up somewhere. I kind of thought this was to be expected, which is why I did not check it more thoroughly.
Maybe the way I persisted is not working well for cql. I will try to create a test scenario where I do not have to persist all my data and see how it performs with cql again.
In principle, what I do is call this function : public void updateEdges(String kmer, int pos, boolean strand, int record, List<SequenceParser.Feature> features){
if(features == null) { features = Arrays.asList(); }
g.withSideEffect("features",features) .V().has("prefix", kmer.substring(0,kmer.length()-1)).fold().coalesce(__.unfold(), __.addV("prefix_node").property("prefix",kmer.substring(0,kmer.length()-1)) ).as("v1"). coalesce(__.V().has("prefix", kmer.substring(1,kmer.length())), __.addV("prefix_node").property("prefix",kmer.substring(1,kmer.length())) ).as("v2"). sideEffect(__.choose(__.select("features").unfold().count().is(P.eq(0)), __.addE("suffix_edge").property("record",record). property("strand",strand).property("pos",pos).from("v1").to("v2")). select("features").unfold(). addE("suffix_edge").property("record",record).property("strand",strand).property("pos",pos) .property(__.map(t -> ((SequenceParser.Feature)t.get()).category), __.map(t -> ((SequenceParser.Feature)t.get()).feature)).from("v1").to("v2")). iterate();
} and every roughly 50000 calls I do a commit. As a side remark, all of the above properties possess indecees. And Feature is a simple class with two attributes category and feature.
Also I adapted the configuration file in the following way:
storage.batch-loading = true
ids.block-size = 100000 ids.authority.wait-time = 2000 ms ids.renew-timeout = 1000000 ms
I tried the same with cql and embedded.
I will get back to you once I have tested it once again. But maybe you already spot an issue? Thanks
Lilly
Am Montag, 7. Oktober 2019 20:14:29 UTC+2 schrieb fa...@...:
toggle quoted messageShow quoted text
We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|
|
Re: [QUESTION] Usage of the cassandraembedded
hi, I think that embedded cassandra can lead to some classpath hell, so this option should at least not be possible with default installation. I have a project where I put my library into janusgraph in a first version. When I had to parse CSV, I found that JG embedded some old CSV libraries and updating them may have some unexpected issues on JG, so I have used this old version even if it was unperfect. In a second version, I use a spring boot application with a remote connection to JG to prevent such issues. elasticsearch has disabled the embedded mode for this reason (see https://www.elastic.co/blog/elasticsearch-the-server ). regards, Nicolas
|
|
Re: [QUESTION] Usage of the cassandraembedded
We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|
|
Re: Persistence of graph view
Hi Lilly,
Thanks for explaining, I feared already that I had missed something. I think each type of query has its optimal treatment. When you have two properties to select on, you would have the cases: - small result set (let us say smaller than 1000 vertices). This is served well by the default CompositeIndex or MixedIndex on these two property keys
- large result set. Here, it is probably more efficient to work with stored vertex id's. However, now you store the id's as a dictionary with the values of p2 as keys. So, your query becomes g.V(ids[p2_value]).
If you can make sense of what works best, it would be interesting to read about your results in a blog!
Btw, your use case of getting a large set of vertices as start of a traversal query, is possibly better served by postgresl or a some linearly scalable SQL store. JanusGraph shines at longer traversals for a small number of starting vertices.
Best wishes, Marc
Op maandag 7 oktober 2019 15:58:36 UTC+2 schreef Lilly:
toggle quoted messageShow quoted text
Hi Marc,
I guess I did not explain my issue very well. What I meant to say is this. Suppose these ids correspond to some filtering criterion. Now having these ids I can create the subgraph.
However, if on this subgraph I want to use another index (not the one related to the filtering criterion) "property", this will not be used.
A (hopefully) simple example. Say I have a graph with properties p1, p2 and indecees on both. Now I get all indecees of vertices that have p1=x and store them in ids. Now doing g.V(ids).has(p2,...) will not make use of index p2. At least it does not show up in the profile step.
Is it clear now what I mean? Or am I mistaken?
Thanks, Lilly
Am Montag, 7. Oktober 2019 15:49:01 UTC+2 schrieb ma...@...: Hi Lily,
When you have the vertex id, you do not need any index. The index is a lookup table from property value to vertex id.
Cheers, Marc
Op maandag 7 oktober 2019 08:15:50 UTC+2 schreef Lilly: Hi Marc,
Thanks for your reply!
Your suggestions would fetch the subgraph efficiently. However, on this subgraph I could no longer use any of my other indecees. Say I have an index on "property". Than g.V(ids).has("property",...) would no longer make use of the index on "property" (only g.V().has("property",..) does. Especially if the subgraph is still rather large, this would be desirable though.
Any thoughts on how to achieve this?
Thanks
Lilly
Am Sonntag, 6. Oktober 2019 09:47:25 UTC+2 schrieb ma...@...: Hi Lilly,
Interesting question. For the JanusGraph backends to lookup the vertices of the subgraph efficiently, they need the id's of the vertices. The traversal is then g.V(ids) . There are different ways to get these id's: - store the id's on ingestion
- query the id's once and store them
- give the subgraph vertices a specific property and run an index on that property. I doubt, however, that this will be efficient for large subgraphs. @Anyone ever tried?
- maybe the JanusGraph IDPlacementStrategy could provide a way to only query the subgraph vertices without knowing their explicit ids. Seem complicated compared to the first two options.
Cheers, Marc
Op vrijdag 4 oktober 2019 17:48:52 UTC+2 schreef Lilly: Hi,
I persisted a janusgraph g1 (with Cassandra backend if that is relevant). Now I would like to
persist a "view" of this graph g1, i.e. a subgraph g2 of g1 which only
contains some of the nodes and edges of g1. This subgraph is to also have
possess all the indecees of the affected nodes and edges.
I am aware of the subgraphstrategy, which can create such a view at runtime. Is it possible to persist this view? I would like to circumvent having to create this view all over again each time. Also, with this view created at runtime, I can no longer exploit other indecees.
If this is not possible, is there another way to achieve this?
Thanks a lot!! Lilly
|
|
Re: Titan to Janus - Change in behavior for properties with the same name but different datatypes.
Bharat Dighe <bdi...@...>
Thanks Abhay and Marc.
It came as a bit surprise due to existing behavior in Titan. This is a bit restrictive given the nature of my app where other than few fixed properties which are defined by the system, rest of the properties are stamped by external sources. I will need to redesign the app given this finding.
Bharat
toggle quoted messageShow quoted text
On Sunday, October 6, 2019 at 9:00:57 AM UTC-7, Abhay Pandit wrote: Hi Bharat, Janusgraph being more consistent it stores only one PropertyKey with a defined any defined name with only one data type throughout the graph. Like in your case "Status" can't be of 2 data type in a graph.
Thanks, Abhay
On Sun, 6 Oct 2019 at 01:03, < ma...@...> wrote: Hi Bharat,
I understand your annoyance while porting your application, but to me the JanusGraph behaviour seems to be more consistent (by the way, I did not check the difference in behaviour you report, I just took your observation for granted). If you want the old Titan behaviour you can simply typecast your variable-type properties to their common denominator (like String, Long, Double, Object, whatever does the job) before you pass them to JanusGraph.
HTH, Marc
Op zaterdag 5 oktober 2019 07:31:56 UTC+2 schreef Bharat Dighe: There is a significant difference in the way Titan and Janus handles properties with the same name which have values with different datatypes. Titan allows it but Janus does not. I am in a process to port my app which is using Titan to Janus and this is causing a major issue. In my app the properites are added dynamically. Other than few fixed properties and there is no predictibility of which properties and their datatypes.
Is there a way Janus can be made to behave same as Titan?
here is an example of the difference of behavior between Titan and Janus
Titan ===== gremlin> v1=graph.addVertex(); ==>v[4144] gremlin> v2=graph.addVertex(); ==>v[4096] gremlin> v1.property("status", 1); ==>vp[status->1] gremlin> v2.property("status","connected"); ==>vp[status->connected] gremlin> v1.property("size", 2000000) ==>vp[size->2000000] gremlin> v2.property("size", 3000000000); ==>vp[size->3000000000] gremlin> v1.property("status").value().getClass(); ==>class java.lang.Integer gremlin> v2.property("status").value().getClass(); ==>class java.lang.String gremlin> v1.property("size").value().getClass(); ==>class java.lang.Integer gremlin> v2.property("size").value().getClass(); ==>class java.lang.Long
Janus ===== gremlin> v1=graph.addVertex(); ==>v[4104] gremlin> v2=graph.addVertex(); ==>v[4176] gremlin> v1.property("status", 1); ==>vp[status->1] gremlin> graph.tx().commit(); ==>null gremlin> v2.property("status","connected"); Value [connected] is not an instance of the expected data type for property key [status] and cannot be converted. Expected: class java.lang.Integer, found: class java.lang.String Type ':help' or ':h' for help. Display stack trace? [yN]n gremlin> v1.property("size", 2000000) ==>vp[size->2000000] gremlin> v2.property("size", 3000000000); Value [3000000000] is not an instance of the expected data type for property key [size] and cannot be converted. Expected: class java.lang.Integer, found: class java.lang.Long Type ':help' or ':h' for help. Display stack trace? [yN]n
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/1b36efa5-a572-4085-81ac-05016a64d7ba%40googlegroups.com.
|
|
Re: Persistence of graph view
Hi Marc,
I guess I did not explain my issue very well. What I meant to say is this. Suppose these ids correspond to some filtering criterion. Now having these ids I can create the subgraph.
However, if on this subgraph I want to use another index (not the one related to the filtering criterion) "property", this will not be used.
A (hopefully) simple example. Say I have a graph with properties p1, p2 and indecees on both. Now I get all indecees of vertices that have p1=x and store them in ids. Now doing g.V(ids).has(p2,...) will not make use of index p2. At least it does not show up in the profile step.
Is it clear now what I mean? Or am I mistaken?
Thanks, Lilly
Am Montag, 7. Oktober 2019 15:49:01 UTC+2 schrieb ma...@...:
toggle quoted messageShow quoted text
Hi Lily,
When you have the vertex id, you do not need any index. The index is a lookup table from property value to vertex id.
Cheers, Marc
Op maandag 7 oktober 2019 08:15:50 UTC+2 schreef Lilly: Hi Marc,
Thanks for your reply!
Your suggestions would fetch the subgraph efficiently. However, on this subgraph I could no longer use any of my other indecees. Say I have an index on "property". Than g.V(ids).has("property",...) would no longer make use of the index on "property" (only g.V().has("property",..) does. Especially if the subgraph is still rather large, this would be desirable though.
Any thoughts on how to achieve this?
Thanks
Lilly
Am Sonntag, 6. Oktober 2019 09:47:25 UTC+2 schrieb ma...@...: Hi Lilly,
Interesting question. For the JanusGraph backends to lookup the vertices of the subgraph efficiently, they need the id's of the vertices. The traversal is then g.V(ids) . There are different ways to get these id's: - store the id's on ingestion
- query the id's once and store them
- give the subgraph vertices a specific property and run an index on that property. I doubt, however, that this will be efficient for large subgraphs. @Anyone ever tried?
- maybe the JanusGraph IDPlacementStrategy could provide a way to only query the subgraph vertices without knowing their explicit ids. Seem complicated compared to the first two options.
Cheers, Marc
Op vrijdag 4 oktober 2019 17:48:52 UTC+2 schreef Lilly: Hi,
I persisted a janusgraph g1 (with Cassandra backend if that is relevant). Now I would like to
persist a "view" of this graph g1, i.e. a subgraph g2 of g1 which only
contains some of the nodes and edges of g1. This subgraph is to also have
possess all the indecees of the affected nodes and edges.
I am aware of the subgraphstrategy, which can create such a view at runtime. Is it possible to persist this view? I would like to circumvent having to create this view all over again each time. Also, with this view created at runtime, I can no longer exploit other indecees.
If this is not possible, is there another way to achieve this?
Thanks a lot!! Lilly
|
|
Re: Persistence of graph view
Hi Lily,
When you have the vertex id, you do not need any index. The index is a lookup table from property value to vertex id.
Cheers, Marc
Op maandag 7 oktober 2019 08:15:50 UTC+2 schreef Lilly:
toggle quoted messageShow quoted text
Hi Marc,
Thanks for your reply!
Your suggestions would fetch the subgraph efficiently. However, on this subgraph I could no longer use any of my other indecees. Say I have an index on "property". Than g.V(ids).has("property",...) would no longer make use of the index on "property" (only g.V().has("property",..) does. Especially if the subgraph is still rather large, this would be desirable though.
Any thoughts on how to achieve this?
Thanks
Lilly
Am Sonntag, 6. Oktober 2019 09:47:25 UTC+2 schrieb ma...@...: Hi Lilly,
Interesting question. For the JanusGraph backends to lookup the vertices of the subgraph efficiently, they need the id's of the vertices. The traversal is then g.V(ids) . There are different ways to get these id's: - store the id's on ingestion
- query the id's once and store them
- give the subgraph vertices a specific property and run an index on that property. I doubt, however, that this will be efficient for large subgraphs. @Anyone ever tried?
- maybe the JanusGraph IDPlacementStrategy could provide a way to only query the subgraph vertices without knowing their explicit ids. Seem complicated compared to the first two options.
Cheers, Marc
Op vrijdag 4 oktober 2019 17:48:52 UTC+2 schreef Lilly: Hi,
I persisted a janusgraph g1 (with Cassandra backend if that is relevant). Now I would like to
persist a "view" of this graph g1, i.e. a subgraph g2 of g1 which only
contains some of the nodes and edges of g1. This subgraph is to also have
possess all the indecees of the affected nodes and edges.
I am aware of the subgraphstrategy, which can create such a view at runtime. Is it possible to persist this view? I would like to circumvent having to create this view all over again each time. Also, with this view created at runtime, I can no longer exploit other indecees.
If this is not possible, is there another way to achieve this?
Thanks a lot!! Lilly
|
|
Re: Exception in java11 while commit [ org.janusgraph.diskstorage.locking.PermanentLockingException: Permanent locking failure]
Florian Hockmann <f...@...>
Java 11 is simply not supported right now. So you have to use Java 8:
JanusGraph requires Java 8 (Standard Edition).
Am Dienstag, 1. Oktober 2019 16:34:59 UTC+2 schrieb nix...@...:
JanusGraph dev,
I am getting the below exception while management.commit(); using java11 works well in java8, using janusgraph version 0.4.0. Any help on this is appreciated.
2019-09-23 05:06:37,707 ERROR - [main:] ~ Could not commit transaction [1] due to storage exception in system-commit (StandardJanusGraph:710)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91)
.
.
Caused by: org.janusgraph.diskstorage.locking.PermanentLockingException: Permanent locking failure
at org.janusgraph.diskstorage.locking.AbstractLocker.checkLocks(AbstractLocker.java:359)
at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.checkAllLocks(ExpectedValueCheckingTransaction.java:175)
at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.prepareForMutations(ExpectedValueCheckingTransaction.java:154)
at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:72)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:94)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91)
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
... 85 more
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Read 1 locks with our rid 48- 97- 52- 49- 51- 48- 55-100- 49- 52- 53- 57- 50- 45-109- 97- 48- 57- 50- 51- 45- 49- 45-118-112- 99- 45- 99-108-111-117-100-101-114- 97- 45- 99-111-109- 49 but mismatched timestamps; no lock column contained our timestamp (2019-09-23T12:06:37.571103Z)
at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.checkSeniority(ConsistentKeyLocker.java:528)
at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.checkSingleLock(ConsistentKeyLocker.java:454)
at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.checkSingleLock(ConsistentKeyLocker.java:118)
at org.janusgraph.diskstorage.locking.AbstractLocker.checkLocks(AbstractLocker.java:351)
... 92 more
2019-09-23 05:06:37,711 ERROR - [main:] ~ Could not commit transaction [1] due to exception (StandardJanusGraph:794)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:91)
|
|
Florian Hockmann <f...@...>
Please share a code listing that shows how you have created the indices and your Gremlin traversal that results in this warning.
Am Mittwoch, 2. Oktober 2019 13:06:02 UTC+2 schrieb arnab kumar pan:
toggle quoted messageShow quoted text
all indexes are shown as enable but at the time og query is showing "use index for better performance", please help, inex backend- elasticsearch, db backend- cassandra
|
|
Re: How to take backup and restore of Janusgraph data backed by scylladb?
Florian Hockmann <f...@...>
JanusGraph doesn't provide any utility for this on its own. You can instead simply refer to your backend, ScyllaDB in your case, and make a backup of that just like it would be done without JanusGraph. If you later restore a backup with Scylla, then JanusGraph will also simply work with that restored data and hence you have restored your graph.
Am Donnerstag, 3. Oktober 2019 13:35:43 UTC+2 schrieb tsh...@...:
toggle quoted messageShow quoted text
Hi,
I need help me taking the backup of my Janusgraph which is stored in Scylladb. I'm very new to this concept. Can someone explain me the steps to do backup and restore of my data please?
Thanks a lot for your time.
|
|