Re: Gremlin Query to return count for nodes and edges
Vinayak Bali
Hi All, The query shared by HadoopMarc works. The query, I executed returns 752650 nodes and 297302 edges as a count. The time taken is around 1min. Is there any way to optimize it further ??? Thank You, Marc, and all others for your help. Thanks & Regards, Vinayak On Wed, Feb 24, 2021 at 2:32 PM Graham Wallis <graham_wallis@...> wrote: Good query from @hadoopmarc and I like @cmilowka's suggestion, although I needed to modify it very slightly as follows: |
|
Re: Gremlin Query to return count for nodes and edges
Graham Wallis <graham_wallis@...>
Good query from @hadoopmarc
and I like @cmilowka's suggestion, although I needed to modify it very
slightly as follows:
g.V().hasLabel('A').union( __.count(), __.outE().count(), __.outE().inV().count() ) That has to be the shortest and neatest solution. Certainly far better than my rather basic effort below, which surely gets the prize for the longest solution :-) g.V().hasLabel('A').aggregate('a').outE().aggregate('e').inV().aggregate('b').select('a').dedup().as('as').select('e').dedup().as('es').select('b').dedup().as('bs').select('as','es','bs').by(unfold().count()) Best regards, Graham Graham Wallis IBM Open Software Internet: graham_wallis@... IBM, Hursley Park, Hursley, Hampshire SO21 2JN From: "cmilowka" <cmilowka@...> To: janusgraph-users@... Date: 23/02/2021 22:49 Subject: [EXTERNAL] Re: [janusgraph-users] Gremlin Query to return count for nodes and edges Sent by: janusgraph-users@... Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU |
|
JanusGraphIndex how to retrieve constraint (indexOnly) specified for the global index?
Global indexes can be limited to the "constraint" by indexOnly() step, from JanusGraph documentation: mgmt.buildIndex('byNameAndLabel', Vertex.class).addKey(name).indexOnly(god).buildCompositeIndex() It is handled by the "createCompositeIndex(...) java code by: addSchemaEdge(indexVertex, (JanusGraphSchemaVertex) constraint, TypeDefinitionCategory.INDEX_SCHEMA_CONSTRAINT, null);
I should learn how the index was created from JanusGraphIndex instance, but the method is not there,... I need to have it automatically detected by the software somehow. Is any other way to know it was Indexed Only By the "god"?
|
|
Re: Gremlin Query to return count for nodes and edges
It may work as well, to count totals of all in and out edges for "A" label: g.V().hasLabel('A').union( __.count(), __. outE().count(), __.inV().count() )
|
|
Re: Gremlin Query to return count for nodes and edges
hadoopmarc@...
Hi Vinayak,
A new attempt: g = TinkerFactory.createModern().traversal() g.withSideEffect('vs', new HashSet()).withSideEffect('es', new HashSet()). V(1,2).aggregate('vs').outE().aggregate('es').inV().aggregate('vs').cap('vs', 'es'). project('vs', 'es'). by(select('vs').unfold().count()). by(select('es').unfold().count()) ==>[vs:4,es:3] This still looks clunky to me, so I challenge other readers to get rid of the project().by(select()) construct. Best wishes, Marc |
|
Re: Changing graphname at runtime
hadoopmarc@...
You really have to try this out and see. I can only answer from what I read in the ref docs.
> Do I need to ConfiguredGraphFactory.close(GRAPH) before I update its configuration? The docs say the binding between graph name and graph instance renews every 20 secs, so maybe this is not necessary. > What happens to GRAPH_TEMP? Wouldn't it be still pointing to the same storage backend HBase table as GRAPH, i.e. to TABLE_B? GRAPH_TEMP is just a name in the JanusGraphManager memory. It does not matter. > if I want to reuse the same scheme, I'd have to have some logic that the next time around I need to renew GRAPH, I have GRAPH_TEMP talk to TABLE_A instead and then switch GRAPH to use TABLE_A, correct? You are right. I would prefer straight versioning or a timestamp in the tablename, or the reuse of names will bite you some day. Of course, you would drop TABLE_A from the storage backend if not needed anymore. Best wishes, Marc |
|
Re: Gremlin Query to return count for nodes and edges
Vinayak Bali
Hi Graham, Tried itm the output is as follows: [{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1}, I want the count something like {v1: 20, e: 60, v2:10} or {v:30, e: 60} Thanks & Regards, Vinayak On Tue, Feb 23, 2021 at 3:00 PM Graham Wallis <graham_wallis@...> wrote: Hi Vinayak |
|
Re: Gremlin Query to return count for nodes and edges
Graham Wallis <graham_wallis@...>
Hi Vinayak
You could do this: g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(count()) That should produce something like: ==>{a=1, e=1, b=1} Best regards, Graham Graham Wallis IBM Open Software Internet: graham_wallis@... IBM, Hursley Park, Hursley, Hampshire SO21 2JN From: "Vinayak Bali" <vinayakbali16@...> To: janusgraph-users@... Date: 23/02/2021 09:11 Subject: [EXTERNAL] Re: [janusgraph-users] Gremlin Query to return count for nodes and edges Sent by: janusgraph-users@... Hi Marc, I am using the following query to return the results. g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold())) Want the count of Hi Marc, I am using the following query to return the results. g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold())) Want the count of unique nodes in a and b together and e i.e number of edges. Please modify this query to get the required output. Thanks & Regards, Vinayak On Tue, Feb 23, 2021 at 1:08 PM <hadoopmarc@...> wrote: Hi
Vinayak, Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU |
|
Re: Gremlin Query to return count for nodes and edges
Vinayak Bali
Hi Marc, I am using the following query to return the results. g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold())) Want the count of unique nodes in a and b together and e i.e number of edges. Please modify this query to get the required output. Thanks & Regards, Vinayak On Tue, Feb 23, 2021 at 1:08 PM <hadoopmarc@...> wrote: Hi Vinayak, |
|
Re: Gremlin Query to return count for nodes and edges
hadoopmarc@...
Hi Vinayak,
Try: g.V().project('v', 'vcount', 'ecount').by(identity()).by(count()).by(bothE().count()) Best wishes, Marc |
|
Gremlin Query to return count for nodes and edges
Vinayak Bali
Hi All, Wanted to return the count of nodes and edges returned by the query. Tired a few queries but they are not working. Can someone please share a single query, which returns both the count? Thanks & Regards, Vinayak |
|
Re: Changing graphname at runtime
Diglio A. Simoni
OK, so that I'm clear, what you're suggesting is that I try something like:
// Create and open main graph map = new HashMap(); map.put("storage.backend", “hbase); map.put("storage.hostname", “xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz”); map.put("storage.hbase.table”, “TABLE_A”); map.put("graph.graphname", “GRAPH”); configuration = new MapConfiguration(map); configuration.setDelimiterParsingDisabled(True); ConfiguredGraphFactory.createConfiguration(configuration); graph = ConfiguredGraphFactory.open(“GRAPH”);
// Create, open and update replacement graph map = new HashMap(); map.put("storage.backend", “hbase); map.put("storage.hostname", “xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz”); map.put("storage.hbase.table”, “TABLE_B”); map.put("graph.graphname", “GRAPH_TEMP”); configuration = new MapConfiguration(map); configuration.setDelimiterParsingDisabled(True); ConfiguredGraphFactory.createConfiguration(configuration); graph = ConfiguredGraphFactory.open(“GRAPH_TEMP”);
// Modify GRAPH_TEMP and when it’s time to make that the live one: map = new HashMap(); map.put("storage.hbase.table”, “TABLE_B); ConfiguredGraphFactory.updateConfiguration(“GRAPH”,map); graph = ConfiguredGraphFactory.open(“GRAPH”); But that raises some additional questions:
|
|
Re: Janusgraph 0.5.3 potential memory leak
Sadly, that quick fix is not only fast but also incorrect. It requires to iterate the `Iterator` twice which causes incorrect results.
|
|
Re: Changing graphname at runtime
hadoopmarc@...
Is this what you are looking for (it includes an explicit example):
https://docs.janusgraph.org/basics/configured-graph-factory/#updating-configurations You can version your graph in the storage and indexing backends, but keep the graph name facing the end user the same. Best wishes, Marc |
|
Changing graphname at runtime
Diglio A. Simoni
Note: I cross-posted this to https://groups.google.com/g/gremlin-users so that I can reach a broader audience, so if you're a member of both groups, please receive my apologies!
Hello, I have a situation where I have a consumer API that talks to JanusGraph. It's configured to connect to a a ConfiguredGraphFactory graph with graphname "Graph_A". I want to update Graph_A in *its totality*, which implies dropping it and recreating it from scratch. The problem is that such a process takes a long time, and I don't want the system to be down while Graph_A is being rebuilt. So I have another ConfiguredGraphFactory graph with graphname Graph_B. I take whatever time is required to create Graph_B, and when it's done, I need to stop the system, change the configuration of the API to now connect to Graph_B and restart the system.
But I don't want to do that.
Instead what I'd like to do is: when Graph_B is ready, I would ConfiguredGraphFactory.drop('Graph_A') and *rename* Graph_B to Graph_A.
Is that possible? If not, does anybody have another solution? This is akin to what one does in computer graphics and double buffering....
|
|
Re: Inconsistent composite index status after transaction failure
Boxuan Li
Try this (v is the ghost vertex):
toggle quoted message
Show quoted text
Iterator<JanusGraphRelation> iterator = v.query().noPartitionRestriction().relations().iterator();
|
|
Re: REINDEXING Big Graph
hadoopmarc@...
Hi Abhay,
The hadoop client picks up configs from the JVM classpath. So, simply add /etc/hadoop/conf (or some other folder that keeps the hdfs-site.xml and other cluster configs) to your classpath. Never done this myself for the indexing mr jobs, nor seen this on this forum, so you may well encounter further barrriers... HTH, Marc |
|
REINDEXING Big Graph
Abhay Pandit
Hi Team, Currently I am trying to REINDEX using Hadoop Mapreduce using the reference from Janus document. https://docs.janusgraph.org/index-management/index-reindexing/#reindex-example-on-mapreduce I wrote my implementation using Java. Here it is running fine. but it is running on Local mode. For running on cluster mode I need to pass hadoop configurations but from documentations I am not clear how to pass any external configuration to run on hadoop or on yarn cluster. If anybody has tried against a big graph like having a Billion of nodes, can you guide me on this? My Java implementation: JanusGraph janusGraph = JanusGraphFactory.open(janusConfig); JanusGraphManagement management; management = janusGraph.openManagement(); JanusGraphIndex graphIndex = management.getGraphIndex("AddressId"); MapReduceIndexManagement mapReduceIndexManagement = new MapReduceIndexManagement(janusGraph); ScanMetrics scanMetrics = mapReduceIndexManagement.updateIndex(graphIndex, SchemaAction.REINDEX).get(); janusConfig: gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend=cql storage.hostname=127.0.0.1 storage.port=9042 storage.keyspace=janusgraph cache.db-cache = false cache.db-cache-clean-wait = 20 cache.db-cache-time = 180000 cache.db-cache-size = 0.25 index.search.backend=elasticsearch index.search.hostname=127.0.0.1 Console log: [INFO] 2021-02-17 13:37:55,173 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.LocalJobRunner - {} - [INFO] 2021-02-17 13:37:56,141 task-1 org.apache.hadoop.mapreduce.Job - {} - map 14% reduce 0% [INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Task:attempt_local67526867_0001_m_000035_0 is done. And is in the process of committing [INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.LocalJobRunner - {} - map [INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Task 'attempt_local67526867_0001_m_000035_0' done. [INFO] 2021-02-17 13:37:57,385 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Final Counters for attempt_local67526867_0001_m_000035_0: Counters: 16 Thanks, Abhay |
|
Re: Janusgraph 0.5.3 potential memory leak
Hey @mad, thanks for your benchmark code! I ran a few experiments with it today and figured out that creating an Iterable from the Iterator seems to already solve the problem. I added the following function to the benchmark: @Benchmark
Benchmark (size) (valueSize) Mode Cnt Score Error Units StaticArrayEntryListBenchmark.iterable 10000 50 thrpt 2 3954.258 ops/s StaticArrayEntryListBenchmark.iterable 10000 1000 thrpt 2 305.872 ops/s StaticArrayEntryListBenchmark.iterable 10000 5000 thrpt 2 85.734 ops/s StaticArrayEntryListBenchmark.iterable 100000 50 thrpt 2 224.861 ops/s StaticArrayEntryListBenchmark.iterable 100000 1000 thrpt 2 19.816 ops/s StaticArrayEntryListBenchmark.iterable 100000 5000 thrpt 2 7.058 ops/s StaticArrayEntryListBenchmark.iterator 10000 50 thrpt 2 1619.764 ops/s StaticArrayEntryListBenchmark.iterator 10000 1000 thrpt 2 142.065 ops/s StaticArrayEntryListBenchmark.iterator 10000 5000 thrpt 2 27.785 ops/s StaticArrayEntryListBenchmark.iterator 100000 50 thrpt 2 181.209 ops/s StaticArrayEntryListBenchmark.iterator 100000 1000 thrpt 2 17.115 ops/s
The throughput is almost as high as using Iterable and even the OOM does not occur anymore. If that also fixes the original problem stated at the beginning of this thread, the solution is just a () -> away! |
|
Re: Inconsistent composite index status after transaction failure
simone3.cattani@...
I already tried to delete it both trying to reference the actual vertex id or using the query based on the "inconsistent" index, but in both cases it doesn't work (drop function return correctly as it has nothing to delete)
|
|