Date   

Re: Changing graphname at runtime

hadoopmarc@...
 

You really have to try this out and see. I can only answer from what I read in the ref docs.

> Do I need to ConfiguredGraphFactory.close(GRAPH) before I update its configuration?
The docs say the binding between graph name and graph instance renews every 20 secs, so maybe this is not necessary.

> What happens to GRAPH_TEMP? Wouldn't it be still pointing to the same storage backend HBase table as GRAPH, i.e. to TABLE_B?
GRAPH_TEMP is just a name in the JanusGraphManager memory. It does not matter.

> if I want to reuse the same scheme, I'd have to have some logic that the next time around I need to renew GRAPH, I have GRAPH_TEMP talk to TABLE_A instead and then switch GRAPH to use TABLE_A, correct?
You are right. I would prefer straight versioning or a timestamp in the tablename, or the reuse of names will bite you some day. Of course, you would drop TABLE_A from the storage backend if not needed anymore.

Best wishes,   Marc


Re: Gremlin Query to return count for nodes and edges

Vinayak Bali
 

Hi Graham,

Tried itm the output is as follows:
[{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},{"v1":1,"e":1,"v2":1},

I want the count something like {v1: 20, e: 60, v2:10} or {v:30, e: 60}

Thanks & Regards,
Vinayak


On Tue, Feb 23, 2021 at 3:00 PM Graham Wallis <graham_wallis@...> wrote:
Hi Vinayak

You could do this:

g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(count())

That should produce something like:

==>{a=1, e=1, b=1}

Best regards,
 Graham

Graham Wallis
IBM Open Software
Internet: graham_wallis@...    
IBM, Hursley Park, Hursley, Hampshire SO21 2JN







From:        "Vinayak Bali" <vinayakbali16@...>
To:        janusgraph-users@...
Date:        23/02/2021 09:11
Subject:        [EXTERNAL] Re: [janusgraph-users] Gremlin Query to return count for nodes and edges
Sent by:        janusgraph-users@...




Hi Marc, I am using the following query to return the results. g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold())) Want the count of
Hi Marc,

I am using the following query to return the results.

g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold()))
Want the count of unique nodes in a and b together and e i.e number of edges.
Please modify this query to get the required output.

Thanks & Regards,
Vinayak

On Tue, Feb 23, 2021 at 1:08 PM <hadoopmarc@...> wrote:

Hi Vinayak,

Try:

g.V().project('v', 'vcount', 'ecount').by(identity()).by(count()).by(bothE().count())

Best wishes,    Marc





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: Gremlin Query to return count for nodes and edges

Graham Wallis <graham_wallis@...>
 

Hi Vinayak

You could do this:

g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(count())

That should produce something like:

==>{a=1, e=1, b=1}

Best regards,
 Graham

Graham Wallis
IBM Open Software
Internet: graham_wallis@...    
IBM, Hursley Park, Hursley, Hampshire SO21 2JN







From:        "Vinayak Bali" <vinayakbali16@...>
To:        janusgraph-users@...
Date:        23/02/2021 09:11
Subject:        [EXTERNAL] Re: [janusgraph-users] Gremlin Query to return count for nodes and edges
Sent by:        janusgraph-users@...




Hi Marc, I am using the following query to return the results. g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold())) Want the count of
Hi Marc,

I am using the following query to return the results.

g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold()))
Want the count of unique nodes in a and b together and e i.e number of edges.
Please modify this query to get the required output.

Thanks & Regards,
Vinayak

On Tue, Feb 23, 2021 at 1:08 PM <hadoopmarc@...> wrote:

Hi Vinayak,

Try:

g.V().project('v', 'vcount', 'ecount').by(identity()).by(count()).by(bothE().count())

Best wishes,    Marc





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: Gremlin Query to return count for nodes and edges

Vinayak Bali
 

Hi Marc,

I am using the following query to return the results.
g.V().hasLabel('A').as('a').outE().as('e').inV().as('b').select('a','e','b').by(valueMap().by(unfold()))
Want the count of unique nodes in a and b together and e i.e number of edges.
Please modify this query to get the required output.

Thanks & Regards,
Vinayak

On Tue, Feb 23, 2021 at 1:08 PM <hadoopmarc@...> wrote:
Hi Vinayak,

Try:

g.V().project('v', 'vcount', 'ecount').by(identity()).by(count()).by(bothE().count())

Best wishes,    Marc


Re: Gremlin Query to return count for nodes and edges

hadoopmarc@...
 

Hi Vinayak,

Try:

g.V().project('v', 'vcount', 'ecount').by(identity()).by(count()).by(bothE().count())

Best wishes,    Marc


Gremlin Query to return count for nodes and edges

Vinayak Bali
 

Hi All,

Wanted to return the count of nodes and edges returned by the query. Tired a few queries but they are not working. Can someone please share a single query, which returns both the count?

Thanks & Regards,
Vinayak  



Re: Changing graphname at runtime

Diglio A. Simoni
 

OK, so that I'm clear, what you're suggesting is that I try something like:

// Create and open main graph

map = new HashMap();

map.put("storage.backend", “hbase);

map.put("storage.hostname", “xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz”);

map.put("storage.hbase.table”, “TABLE_A”);

map.put("graph.graphname", “GRAPH”);

configuration = new MapConfiguration(map);

configuration.setDelimiterParsingDisabled(True);

ConfiguredGraphFactory.createConfiguration(configuration);

graph = ConfiguredGraphFactory.open(“GRAPH”);

 

// Create, open and update replacement graph

map = new HashMap();

map.put("storage.backend", “hbase);

map.put("storage.hostname", “xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz”);

map.put("storage.hbase.table”, “TABLE_B”);

map.put("graph.graphname", “GRAPH_TEMP”);

configuration = new MapConfiguration(map);

configuration.setDelimiterParsingDisabled(True);

ConfiguredGraphFactory.createConfiguration(configuration);

graph = ConfiguredGraphFactory.open(“GRAPH_TEMP”);

 

// Modify GRAPH_TEMP and when it’s time to make that the live one:

map = new HashMap();

map.put("storage.hbase.table”, “TABLE_B);

ConfiguredGraphFactory.updateConfiguration(“GRAPH”,map);

graph = ConfiguredGraphFactory.open(“GRAPH”);


But that raises some additional questions:
  • Do I need to ConfiguredGraphFactory.close(GRAPH) before I update its configuration?
  • What happens to GRAPH_TEMP? Wouldn't it be still pointing to the same storage backend HBase table as GRAPH, i.e. to TABLE_B?
  • if I want to reuse the same scheme, I'd have to have some logic that the next time around I need to renew GRAPH, I have GRAPH_TEMP talk to TABLE_A instead and then switch GRAPH to use TABLE_A, correct?


Re: Janusgraph 0.5.3 potential memory leak

rngcntr
 
Edited

Sadly, that quick fix is not only fast but also incorrect. It requires to iterate the `Iterator` twice which causes incorrect results.


Re: Changing graphname at runtime

hadoopmarc@...
 

Is this what you are looking for (it includes an explicit example):

https://docs.janusgraph.org/basics/configured-graph-factory/#updating-configurations

You can version your graph in the storage and indexing backends, but keep the graph name facing the end user the same.

Best wishes,    Marc


Changing graphname at runtime

Diglio A. Simoni
 

Note: I cross-posted this to https://groups.google.com/g/gremlin-users so that I can reach a broader audience, so if you're a member of both groups, please receive my apologies!

Hello,
 
I have a situation where I have a consumer API that talks to JanusGraph. It's configured to connect to a a ConfiguredGraphFactory graph with graphname "Graph_A". I want to update Graph_A in *its totality*, which implies dropping it and recreating it from scratch. The problem is that such a process takes a long time, and I don't want the system to be down while Graph_A is being rebuilt. So I have another ConfiguredGraphFactory graph with graphname Graph_B. I take whatever time is required to create Graph_B, and when it's done, I need to stop the system, change the configuration of the API to now connect to Graph_B and restart the system.
 
But I don't want to do that.
 
Instead what I'd like to do is: when Graph_B is ready, I would ConfiguredGraphFactory.drop('Graph_A') and *rename* Graph_B to Graph_A.
 
Is that possible? If not, does anybody have another solution? This is akin to what one does in computer graphics and double buffering....
 


Re: Inconsistent composite index status after transaction failure

Boxuan Li
 

Try this (v is the ghost vertex):
Iterator<JanusGraphRelation> iterator = v.query().noPartitionRestriction().relations().iterator();
while (iterator.hasNext()) {
iterator.next();
iterator.remove();
}
v.remove()

On Feb 17, 2021, at 7:24 PM, simone3.cattani@... wrote:

I already tried to delete it both trying to reference the actual vertex id or using the query based on the "inconsistent" index, but in both cases it doesn't work (drop function return correctly as it has nothing to delete) 


Re: REINDEXING Big Graph

hadoopmarc@...
 

Hi Abhay,

The hadoop client picks up configs from the JVM classpath. So, simply add /etc/hadoop/conf (or some other folder that keeps the hdfs-site.xml and other cluster configs) to your classpath. Never done this myself for the indexing mr jobs, nor seen this on this forum, so you may well encounter further barrriers...

HTH,    Marc


REINDEXING Big Graph

Abhay Pandit
 

Hi Team,
Currently I am trying to REINDEX using Hadoop Mapreduce using the reference from Janus document.
https://docs.janusgraph.org/index-management/index-reindexing/#reindex-example-on-mapreduce

I wrote my implementation using Java. Here it is running fine. but it is running on Local mode.
For running on cluster mode I need to pass hadoop configurations but from documentations I am not clear how to pass any external configuration to run on hadoop or on yarn cluster.
If anybody has tried against a big graph like having a Billion of nodes, can you guide me on this?

My Java implementation:
JanusGraph janusGraph = JanusGraphFactory.open(janusConfig);
JanusGraphManagement management;
management = janusGraph.openManagement(); JanusGraphIndex graphIndex = management.getGraphIndex("AddressId");
MapReduceIndexManagement mapReduceIndexManagement = new MapReduceIndexManagement(janusGraph);
ScanMetrics scanMetrics = mapReduceIndexManagement.updateIndex(graphIndex, SchemaAction.REINDEX).get();

janusConfig:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=127.0.0.1
storage.port=9042
storage.keyspace=janusgraph
cache.db-cache = false
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

Console log:
[INFO] 2021-02-17 13:37:55,173 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.LocalJobRunner - {} -
[INFO] 2021-02-17 13:37:56,141 task-1 org.apache.hadoop.mapreduce.Job - {} -  map 14% reduce 0%
[INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Task:attempt_local67526867_0001_m_000035_0 is done. And is in the process of committing
[INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.LocalJobRunner - {} - map
[INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Task 'attempt_local67526867_0001_m_000035_0' done.
[INFO] 2021-02-17 13:37:57,385 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Final Counters for attempt_local67526867_0001_m_000035_0: Counters: 16

Thanks,
Abhay


Re: Janusgraph 0.5.3 potential memory leak

rngcntr
 
Edited

Hey @mad, thanks for your benchmark code! I ran a few experiments with it today and figured out that creating an Iterable from the Iterator seems to already solve the problem. I added the following function to the benchmark:

@Benchmark
public void iterator_iterable(Blackhole bh) {
EntryList result = StaticArrayEntryList.ofStaticBuffer(() -> entries.iterator(), StaticArrayEntry.ENTRY_GETTER);
bh.consume(result);
}


And the results look very promising:

Benchmark                                        (size)  (valueSize)   Mode  Cnt     Score   Error  Units
StaticArrayEntryListBenchmark.iterable            10000           50  thrpt    2  3954.258          ops/s
StaticArrayEntryListBenchmark.iterable            10000         1000  thrpt    2   305.872          ops/s
StaticArrayEntryListBenchmark.iterable            10000         5000  thrpt    2    85.734          ops/s
StaticArrayEntryListBenchmark.iterable           100000           50  thrpt    2   224.861          ops/s
StaticArrayEntryListBenchmark.iterable           100000         1000  thrpt    2    19.816          ops/s
StaticArrayEntryListBenchmark.iterable           100000         5000  thrpt    2     7.058          ops/s
StaticArrayEntryListBenchmark.iterator            10000           50  thrpt    2  1619.764          ops/s
StaticArrayEntryListBenchmark.iterator            10000         1000  thrpt    2   142.065          ops/s
StaticArrayEntryListBenchmark.iterator            10000         5000  thrpt    2    27.785          ops/s
StaticArrayEntryListBenchmark.iterator           100000           50  thrpt    2   181.209          ops/s
StaticArrayEntryListBenchmark.iterator           100000         1000  thrpt    2    17.115          ops/s
StaticArrayEntryListBenchmark.iterator 100000 5000 java.lang.OutOfMemoryError: Java heap space StaticArrayEntryListBenchmark.iterator_iterable 10000 50 thrpt 2 3557.666 ops/s StaticArrayEntryListBenchmark.iterator_iterable 10000 1000 thrpt 2 331.978 ops/s StaticArrayEntryListBenchmark.iterator_iterable 10000 5000 thrpt 2 87.827 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 50 thrpt 2 241.963 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 1000 thrpt 2 20.257 ops/s StaticArrayEntryListBenchmark.iterator_iterable 100000 5000 thrpt 2 7.278 ops/s

 

 

The throughput is almost as high as using Iterable and even the OOM does not occur anymore. If that also fixes the original problem stated at the beginning of this thread, the solution is just a  () ->  away!


Re: Inconsistent composite index status after transaction failure

simone3.cattani@...
 

I already tried to delete it both trying to reference the actual vertex id or using the query based on the "inconsistent" index, but in both cases it doesn't work (drop function return correctly as it has nothing to delete) 


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Great! We've done 20 minute slots in the past, that may work well for this if we do around 10-15 minutes presentation, 5-10 for discussion/Q&A? In reality, that'll just scratch the surface but will give folks some jumping off points.

For others, what graph algorithms have you operationalized or would like to? What worked, what didn't? Real world use cases (successes or failures!) are always of keen interest to the group.

--Ted

On Tue, Feb 16, 2021 at 1:35 AM <hadoopmarc@...> wrote:
Hi Ted,

Yes, a short overview of OLAP questions from the user list sounds like a good idea and is easy to prepare. It need not be long; 10 minutes including a few questions for clarifications would do. If you want to discuss these issues in more depth, more time is needed, of course.

Best wishes,       Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

hadoopmarc@...
 

Hi Ted,

Yes, a short overview of OLAP questions from the user list sounds like a good idea and is easy to prepare. It need not be long; 10 minutes including a few questions for clarifications would do. If you want to discuss these issues in more depth, more time is needed, of course.

Best wishes,       Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hey Dylan,
Thanks for the links. That's a promising set of projects. I think a brief survey of OLAP graph engines that may be applicable to JG users would be very interesting. In addition to looking at alternative OLAP engines, I think the question of integration is an interesting one. For example, TP Spark pulls data directly out of JG. I find this attractive from the standpoint of not having to maintain a mirror image of the OLTP graph, but we pay a large performance penalty. Alternatively, a mirror image OLAP graph can be maintained, likely using the same change feed that JG ingests. A third, alternative, that may be feasible using the in-memory storage backend and the darker corners of the JG code base, the FulgoraGraphComputer, could possibly be made to work in a zero-copy fashion. Anyway, not as exciting as the selection/development of the OLAP engine itself, but I think the integration will play a big part in ease of use and adoption.

--Ted

On Fri, Feb 12, 2021 at 4:49 PM Dylan Bethune-Waddell <dylan.bethune.waddell@...> wrote:
Hi Ted,

Great idea Ted. Wanted to mention KatanaGraph (website, github). It's basically a port of this codebase called Galois (website, github). Appears to be a group of UT Austin researchers taking their impressive results (paper) solving various OLAP graph computing problems into open source (3-Clause BSD License). From what I've gathered poking around the new codebase vs. old, and the demo server you can launch a notebook on, they aim to commercialize the distributed GPU aspect of Galois after getting it production ready as katana "enterprise". The guts of it exist in the Galois codebase and they do refer to it - could be a good conversation to have in the JanusGraph community.

Seems like KatanaGraph and cool stuff like rapids.ai spark-rapids are all using the Apache Arrow format, might be an integration to consider. Another interesting project is the GraphBLAS, which is a spec but now has concrete implementations including this one which is from a "competitor" to KatanaGraph, gunrock. IIRC the gunrock direction-optimized BFS code is faster on power-law graphs than the implementation of BFS in katana/galois, which might be Interesting in terms of how Gremlin expects to do it's OLAP traversals.

Best,
Dylan

On Thu, Feb 11, 2021 at 11:51 AM <hadoopmarc@...> wrote:
Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Ted Wilmes
 

Hi Marc,
Yes, I most definitely recognize your nickname and have been a beneficiary of many of your answers, blog posts, etc. Glad to hear you're interested in participating. You've been prolific on the lists and I'm wondering if you have a top 5 olap items that you see people have trouble with over and over? A brief presentation of your responses and pointers to what you've already written would probably be very helpful for folks who are attempting the Spark path.

Thanks,
Ted


On Thu, Feb 11, 2021 at 10:51 AM <hadoopmarc@...> wrote:
Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

Dylan Bethune-Waddell
 

Hi Ted,

Great idea Ted. Wanted to mention KatanaGraph (website, github). It's basically a port of this codebase called Galois (website, github). Appears to be a group of UT Austin researchers taking their impressive results (paper) solving various OLAP graph computing problems into open source (3-Clause BSD License). From what I've gathered poking around the new codebase vs. old, and the demo server you can launch a notebook on, they aim to commercialize the distributed GPU aspect of Galois after getting it production ready as katana "enterprise". The guts of it exist in the Galois codebase and they do refer to it - could be a good conversation to have in the JanusGraph community.

Seems like KatanaGraph and cool stuff like rapids.ai spark-rapids are all using the Apache Arrow format, might be an integration to consider. Another interesting project is the GraphBLAS, which is a spec but now has concrete implementations including this one which is from a "competitor" to KatanaGraph, gunrock. IIRC the gunrock direction-optimized BFS code is faster on power-law graphs than the implementation of BFS in katana/galois, which might be Interesting in terms of how Gremlin expects to do it's OLAP traversals.

Best,
Dylan

On Thu, Feb 11, 2021 at 11:51 AM <hadoopmarc@...> wrote:
Hi Ted,

Most probably you recognize my nickname from the answers I provided on this user forum on OLAP attempts with JanusGraph. I also co-authored:

https://tinkerpop.apache.org/docs/current/recipes/#connected-components

showing the need to test the scalability of graph algorithms.
I am interested to participate in the meeting and I am open to suggestions where contributions are most needed (no new material, so part of panel or presenting old material).

Best wishes,     Marc

1041 - 1060 of 6665