Hi Jan,
So I tried it again. First of all, I remembered, that for cql I need to commit after each step. Otherwise, I get "violation of unique key" errors, even though I am actually not. Is this supposed to be the case (having to commit each time)? Now on doing the commit after each function call, I found that with the adaption in the properties configuration (see last reply) it is really super slow. If I use the "default" configuration for cql, it is a bit faster but still much slower than in the embedded case. I also tried it with another graph which I persisted like this:
public void persist(Map<Integer, Map<String,Object>> nodes, Map<Integer,Integer> edges, Map<Integer,Map<String,String>> names) { g = graph.traversal();
int counter = 0; for(Map.Entry<Integer, Map<String,Object>> e: nodes.entrySet()) {
Vertex v = g.addV().property("taxId",e.getKey()). property("rank",e.getValue().get("rank")). property("divId",e.getValue().get("divId")). property("genId",e.getValue().get("genId")).next(); g.tx().commit(); Map<String,String> n = names.get(e.getKey()); if(n != null) { for(Map.Entry<String,String> vals: n.entrySet()) { g.V(v).property(vals.getKey(),vals.getValue()).iterate(); g.tx().commit(); } }
if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++;
}
counter = 0; for(Map.Entry<Integer,Integer> e: edges.entrySet()) { g.V().has("taxId",e.getKey()).as("v1").V(). has("taxId",e.getValue()).as("v2"). addE("has_parent").from("v1").to("v2").iterate(); g.tx().commit(); if(counter % BULK_CHOP_SIZE == 0) {
System.out.println(counter); } counter++; }
g.V().has("taxId",1).as("v").outE().filter(__.inV().where(P.eq("v"))).drop().iterate(); g.tx().commit(); System.out.println("Done with persistence"); }
And had the same problem in either case.
I am probably using the cql backend wrong somehow and would appreciate any help on what else to do! Thanks, Lilly
Am Dienstag, 8. Oktober 2019 09:05:56 UTC+2 schrieb Lilly:
toggle quoted message
Show quoted text
Hi Jan, Ok then I probably screwed up somewhere. I kind of thought this was to be expected, which is why I did not check it more thoroughly.
Maybe the way I persisted is not working well for cql. I will try to create a test scenario where I do not have to persist all my data and see how it performs with cql again.
In principle, what I do is call this function : public void updateEdges(String kmer, int pos, boolean strand, int record, List<SequenceParser.Feature> features){
if(features == null) { features = Arrays.asList(); }
g.withSideEffect("features",features) .V().has("prefix", kmer.substring(0,kmer.length()-1)).fold().coalesce(__.unfold(), __.addV("prefix_node").property("prefix",kmer.substring(0,kmer.length()-1)) ).as("v1"). coalesce(__.V().has("prefix", kmer.substring(1,kmer.length())), __.addV("prefix_node").property("prefix",kmer.substring(1,kmer.length())) ).as("v2"). sideEffect(__.choose(__.select("features").unfold().count().is(P.eq(0)), __.addE("suffix_edge").property("record",record). property("strand",strand).property("pos",pos).from("v1").to("v2")). select("features").unfold(). addE("suffix_edge").property("record",record).property("strand",strand).property("pos",pos) .property(__.map(t -> ((SequenceParser.Feature)t.get()).category), __.map(t -> ((SequenceParser.Feature)t.get()).feature)).from("v1").to("v2")). iterate();
} and every roughly 50000 calls I do a commit. As a side remark, all of the above properties possess indecees. And Feature is a simple class with two attributes category and feature.
Also I adapted the configuration file in the following way:
storage.batch-loading = true
ids.block-size = 100000 ids.authority.wait-time = 2000 ms ids.renew-timeout = 1000000 ms
I tried the same with cql and embedded.
I will get back to you once I have tested it once again. But maybe you already spot an issue? Thanks
Lilly
Am Montag, 7. Oktober 2019 20:14:29 UTC+2 schrieb fa...@...: We don't see this problem on persistence.
It would be good know what takes longer. Do like to give some more informations?
Jan
|