Date   

Re: Driver's for java connectivity

hadoopmarc@...
 

Hi Vinayak,

What actually is the "desired operation", it is not clear to me? In other words, what is the result after you have run the desired operation?

Best wishes,    Marc


Re: Script16.groovy: 2: unable to resolve class StandardJanusGraph

hadoopmarc@...
 

Hi Vinayak,

What may confuse you is that the gremlin console does a lot of under the hood imports, but it doesn't import all janusgraph classes. So, you can solve this in two ways:
  1. Preferred: rather use the underlying interface for specifiying the type, so line 3 private Graph graph;
  2. Optional: do an explicit import of StandardJanusGraph
gremlin> StandardJanusGraph
No such property: StandardJanusGraph for class: groovysh_evaluate
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> Graph
==>interface org.apache.tinkerpop.gremlin.structure.Graph


Best wishes,    Marc


Re: How to circumvent transaction cache?

timon.schneider@...
 

Hi all,


On Fri, Mar 5, 2021 at 05:32 PM, Ted Wilmes wrote:
Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.
I actually aim to keep the system ACID compliant. The only thing is (which I struggle to implement in JG) is that the edits only can be committed after a lock and read is done on the branch vertex' isPublished property. The problem is JG doesn't offer select for update functionality. I need to read the branch vertex to get the Id and lock it but while I'm getting it the isPublished property can be set to true by another user. Getting the vertex, locking it, and refreshing the data could be an option however it's not supported by JG.

Isn't this a shortcoming of JG that many users have issues with?

I think the single thread solution you suggest is not an option as our application is a meta data editor where multiple users should be able to edit elements of a branch simultaneously.

@Bo Xuan Li
I'm very much concerned with the consistency of the data. The check on the branch vertex is just a read operation necessary to guarantee that the branch is not published at the point of persisting the edits.


Re: Authentication in JanusGraph Server

grahamwallis.dev@...
 

Thanks for looking at it Marc, and for raising the issue.

As you say, we can work around the issue by using a different persistent store, but I must admit to being intrigued as to why it seems to not work with berkeleydb. If I get time I will do some more digging - and will add any comments to the above issue.

Thanks again
  Graham


Script16.groovy: 2: unable to resolve class StandardJanusGraph

Vinayak Bali
 

Hi All,

Using a batch processing script to load the data into the graph. With JanusGraphFactory the scripts were working as expected. Now the same code is not working with ConfiguredGraphFactory. The error is as follows:

Script16.groovy: 2: unable to resolve class StandardJanusGraph
 @ line 2, column 3.
     private StandardJanusGraph graph;
     ^

Script16.groovy: 7: unable to resolve class StandardJanusGraph
 @ line 7, column 22.
     public CsvImporter(StandardJanusGraph graph, int batchNumber, List csvRecords
                        ^

2 errors
Type ':help' or ':h' for help.
Display stack trace? [yN]y
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script16.groovy: 2: unable to resolve class StandardJanusGraph
 @ line 2, column 3.
     private StandardJanusGraph graph;
     ^

Script16.groovy: 7: unable to resolve class StandardJanusGraph
 @ line 7, column 22.
     public CsvImporter(StandardJanusGraph graph, int batchNumber, List csvRecords
                        ^

2 errors

at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:311)
at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:980)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:647)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:596)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:390)
at groovy.lang.GroovyClassLoader.access$300(GroovyClassLoader.java:89)
at groovy.lang.GroovyClassLoader$5.provide(GroovyClassLoader.java:330)
at groovy.lang.GroovyClassLoader$5.provide(GroovyClassLoader.java:327)
at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:147)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:325)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:309)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:251)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine$GroovyCacheLoader.lambda$load$0(GremlinGroovyScriptEngine.java:819)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.CompletableFuture.asyncSupplyStage(CompletableFuture.java:1618)
at java.util.concurrent.CompletableFuture.supplyAsync(CompletableFuture.java:1843)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine$GroovyCacheLoader.load(GremlinGroovyScriptEngine.java:817)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine$GroovyCacheLoader.load(GremlinGroovyScriptEngine.java:812)
at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:3117)
at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:144)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$16(BoundedLocalCache.java:1968)
at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1892)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1966)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1949)
at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:113)
at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:67)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.getScriptClass(GremlinGroovyScriptEngine.java:567)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:374)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

A small part of the code:

class CsvImporter implements Runnable {

  private StandardJanusGraph graph;
  private int batchNumber;
  private List csvRecords;
  private int lastRecord;
  private Closure processor;

  public CsvImporter(StandardJanusGraph graph, int batchNumber, List csvRecords
    , int lastRecord, Closure processor) {

    this.graph = graph;
    this.batchNumber = batchNumber
    this.csvRecords = csvRecords
    this.lastRecord = lastRecord
    this.processor = processor
  }

Initial code to access graph using JanusGraphFactory
graph = ctx.graph = graph = JanusGraphFactory.open('/home/fusionops/janusgraph-full-0.5.2/conf/graph1.properties')
Updated code to access graph using ConfiguredGraphFactory
graph = ctx.graph = graph = ConfiguredGraphFactory.open("merck_graph_explorer_demo")

Thanks & Regards,
Vinayak


Driver's for java connectivity

Vinayak Bali
 

Hi All,

We are connecting to janusgraph using java using an API. The API executes the query and returns the data.Facing memory issues frequently while using gremlin driver(org.apache.tinkerpop.gremlin.driver). 
The size occupied by the graph in db/cassandra/data directory is 690 MB. While we load the entire graph using API and gremlin driver, its takes approximately 2GB. 
Backend: Cassandra
Janusgraph: 0.5.2
Request you to suggest an alternative efficient way to perform the desired operation successfully.

Thanks & Regards,
Vinayak


Goaway - Errors with BigTable

Assaf Schwartz
 

Hi everyone,

We are using JanusGraph 0.5.3 on top of BigTable.
Over the past day have been experiencing inconsistent performance issues, while seeing some errors in the JG logs we are not familiar with:

com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.async.AbstractRetryingOperation - Retrying failed call. Failure #1, got: Status{code=UNAVAILABLE, description=HTTP/2 error code: NO_ERROR
Received Goaway load_shed, cause=null} on channel 7.
Trailers: Metadata(bigtable-channel-id=7)

Does anyone have any experience with those kind of issues? This isn't consistent, and can cause some traversals to time out, obviously, naively increasing the timeout is band-aid
i'd like to avoid. Are there any configurations that needs to be done?

Thanks in advance!


Re: Question on Design and Suitability of janus graph

hadoopmarc@...
 

Hi Basanth Gowda,

The fit between your use case and janusgraph does not seem particularly good. The mean reasons for my opinion are:
  1. your datamodel still seems rather simple (website visitors and groups); it could easily be handled in a relational model using SQL
  2. analytical queries are important to you, while this aspect of janusgraph is still in its infancy (you might want to check an OLAP meeting to be planned)
I am curious to see whether other opinions pop up!

Best wishes,    Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

hadoopmarc@...
 

Hi Ted,

Saw these two interesting threads on the dev list the other day:
https://lists.lfaidata.foundation/g/janusgraph-dev/topic/performance_optimization/80653320
https://lists.lfaidata.foundation/g/janusgraph-dev/topic/performance_issue_large/80821002

Apparently, the people at Zeotab do analytics on janusgraph at a massive scale by having many spark executors individually connect to janusgraph (skipping SparkGraphComputer/HadoopGraph). It would be interesting to have them at the meeting and hear what kind of analytic queries they do, in particular:
  • how do they access the table with janusgraph id's?
  • how do they aggregate the results of individual spark partitions into the end result of the gremlin query?
  • how do they retrieve vertex data for step 2,3,.... of the traversal (spark shuffle vs each executor retrieving additional vertex data from janusgraph)?
Best wishes,    Marc


Question on Design and Suitability of janus graph

Basanth Gowda <basanth.gowda@...>
 


We are embarking on a new initiative and wanted to get community inputs if using Janus Graph is a good fit. Any alternative suggestions also welcome.  

  • New records are added regularly. Assume every visitor to the website
  • A Visitor can become a customer
  • Customer can join a group voluntarily. A group customer joins could be high cardinality or low cardinality. 
  • Customer will be added to group by the system based on characteristics. (For example Age Groups, Male/Female/Other, Country etc...)
  • Customers can move among groups or cease to be part of the group
  • Customers would be part of a group for a given duration. For example they are part of a group when an event is happening.
  • Customer has multiple Unique identifiers to get them by that key (could be customerId, subscriptionId etc)
We are looking at 300 - 400 million entries.

We are expecting decent amount of OLAP requests like :
  • Give me all the customers that belong to a group
  • Give me all customers that belonged to a group but not any more.
  • Give me all customers that a group but also belong to another group.
  • Give me related customers (referrals)

We have Elastic Search, Cassandra  and others being used in our ecosystem

thank you,
BG


Re: Authentication in JanusGraph Server

hadoopmarc@...
 


Re: Authentication in JanusGraph Server

hadoopmarc@...
 

Hi Graham,

This was certainly one to investigate for the weekend. Where you started investigating from the inside of janusgraph, I started from the user perspective and this is what I did:
  1. I replicated your steps on janusgraph-full-0.5.3 and hit the same issue (incorrect username/password)
  2. I also replicated your steps on janusgraph-0.3.2 to be sure no bugs were introduced in later versions, but still the same issue
  3. I checked the old user list and found https://groups.google.com/g/janusgraph-users/c/iVqlUS2zQbc/m/vmf8PgEQBAAJ  This was interesting: someone had problems with the credentialsDb and only got it working after switching from a Berkeleyje backend to an HBase backend. This was a pattern: your issue also was with Berkeleyje
  4. In the authentication section of the gremlin-server.yaml I changed the properties file for the credentialsDb to one using cql-es with a keyspace "credentials" and... remote authentication worked
This was a nasty one, but the effort you had already taken inspired me to do my part. I will make an issue report for this on github.

Best wishes,   Marc


Re: How to circumvent transaction cache?

hadoopmarc@...
 

Hi Timon,

Adding to the answer of Ted, I can imagine that your new data enter your pipeline from a Kafka queue. With a microbatching solution, e.g. Apache Spark streaming, you could pre-shuffle your data per microbatch to be sure that all data relating to a branch are in a single partition. After that, a single thread can handle this single partition in one JanusGraph transaction. This approach seems fit better to your use case that trying to circumvent ACID limits in a tricky way.

Best wishes,    Marc


Re: How to circumvent transaction cache?

Boxuan Li
 

Hi Timon,

As I mentioned earlier, the only way I can think of (assuming you are not concerned about the consistency of data storage as Ted mentioned) is to modify JanusGraph source code:

In CacheVertex class, there is a data structure, protected final Map<SliceQuery, EntryList> queryCache.

What you could do is to add a method to that class:

public void refresh() {
    queryCache.clear();
}

And then you can call refresh before you want to load new value from the storage rather than cache:

((CacheVertex) v1).refresh();

Hope this helps,
Boxuan


On Mar 6, 2021, at 12:32 AM, Ted Wilmes <twilmes@...> wrote:

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.




Re: How to circumvent transaction cache?

Ted Wilmes
 

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.


Re: How to circumvent transaction cache?

timon.schneider@...
 

Thanks for your suggestion, but the consistency setting does not solve my problem.


Re: How to circumvent transaction cache?

Nicolas Trangosi
 

Hi Simon,
It seems that you can force JG to re-read elements just before commit according to

I have never try the option mgmt.setConsistency but this may help you.

Regards,
Nicolas

Le ven. 5 mars 2021 à 10:20, <timon.schneider@...> a écrit :

[Edited Message Follows]

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A starts transaction and makes edits, does not commit yet
B makes an edit to vertex X and commits
A cannot see B's edit to vertex X unless A commits or rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon



--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   




Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.


Re: How to circumvent transaction cache?

timon.schneider@...
 
Edited

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A reads vertex X and then starts transaction and makes edits, does not commit yet
B may or may not edit X
A continues editing and before committing it needs to makes sure vertex X was not changed by B or else rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon


Re: Authentication in JanusGraph Server

grahamwallis.dev@...
 

Hi @hadoopmarc,

Thanks for replying and no apology needed - it's a good question. Although I failed to mention it in my question, I did set the credentials to ('graham','sass-password') in the sasl-remote.yaml file when testing with the JanusGraph as credentials store.

Setting a breakpoint in the server I could see the correct credentials being received, and the credentials store traversal looked fine; but no vertex is returned.

All the best
  Graham


Re: how to delete Ghost vertices and ghost edges?

Boxuan Li
 


「<vamsi.lingala@...>」在 2021年3月4日 週四,下午4:42 寫道:

gremlin> g.V(6389762617560).valueMap()
==>{}
gremlin>
gremlin> g.V().hasLabel("MAID").has("madsfid","sfmsdlk").outE("MAIH1").as("e").inV().as("v").select("e", "v").by(valueMap())
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}

981 - 1000 of 6661