Date   

Script16.groovy: 2: unable to resolve class StandardJanusGraph

Vinayak Bali
 

Hi All,

Using a batch processing script to load the data into the graph. With JanusGraphFactory the scripts were working as expected. Now the same code is not working with ConfiguredGraphFactory. The error is as follows:

Script16.groovy: 2: unable to resolve class StandardJanusGraph
 @ line 2, column 3.
     private StandardJanusGraph graph;
     ^

Script16.groovy: 7: unable to resolve class StandardJanusGraph
 @ line 7, column 22.
     public CsvImporter(StandardJanusGraph graph, int batchNumber, List csvRecords
                        ^

2 errors
Type ':help' or ':h' for help.
Display stack trace? [yN]y
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script16.groovy: 2: unable to resolve class StandardJanusGraph
 @ line 2, column 3.
     private StandardJanusGraph graph;
     ^

Script16.groovy: 7: unable to resolve class StandardJanusGraph
 @ line 7, column 22.
     public CsvImporter(StandardJanusGraph graph, int batchNumber, List csvRecords
                        ^

2 errors

at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:311)
at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:980)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:647)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:596)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:390)
at groovy.lang.GroovyClassLoader.access$300(GroovyClassLoader.java:89)
at groovy.lang.GroovyClassLoader$5.provide(GroovyClassLoader.java:330)
at groovy.lang.GroovyClassLoader$5.provide(GroovyClassLoader.java:327)
at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:147)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:325)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:309)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:251)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine$GroovyCacheLoader.lambda$load$0(GremlinGroovyScriptEngine.java:819)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.CompletableFuture.asyncSupplyStage(CompletableFuture.java:1618)
at java.util.concurrent.CompletableFuture.supplyAsync(CompletableFuture.java:1843)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine$GroovyCacheLoader.load(GremlinGroovyScriptEngine.java:817)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine$GroovyCacheLoader.load(GremlinGroovyScriptEngine.java:812)
at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:3117)
at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:144)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$16(BoundedLocalCache.java:1968)
at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1892)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1966)
at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1949)
at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:113)
at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:67)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.getScriptClass(GremlinGroovyScriptEngine.java:567)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:374)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

A small part of the code:

class CsvImporter implements Runnable {

  private StandardJanusGraph graph;
  private int batchNumber;
  private List csvRecords;
  private int lastRecord;
  private Closure processor;

  public CsvImporter(StandardJanusGraph graph, int batchNumber, List csvRecords
    , int lastRecord, Closure processor) {

    this.graph = graph;
    this.batchNumber = batchNumber
    this.csvRecords = csvRecords
    this.lastRecord = lastRecord
    this.processor = processor
  }

Initial code to access graph using JanusGraphFactory
graph = ctx.graph = graph = JanusGraphFactory.open('/home/fusionops/janusgraph-full-0.5.2/conf/graph1.properties')
Updated code to access graph using ConfiguredGraphFactory
graph = ctx.graph = graph = ConfiguredGraphFactory.open("merck_graph_explorer_demo")

Thanks & Regards,
Vinayak


Driver's for java connectivity

Vinayak Bali
 

Hi All,

We are connecting to janusgraph using java using an API. The API executes the query and returns the data.Facing memory issues frequently while using gremlin driver(org.apache.tinkerpop.gremlin.driver). 
The size occupied by the graph in db/cassandra/data directory is 690 MB. While we load the entire graph using API and gremlin driver, its takes approximately 2GB. 
Backend: Cassandra
Janusgraph: 0.5.2
Request you to suggest an alternative efficient way to perform the desired operation successfully.

Thanks & Regards,
Vinayak


Goaway - Errors with BigTable

Assaf Schwartz
 

Hi everyone,

We are using JanusGraph 0.5.3 on top of BigTable.
Over the past day have been experiencing inconsistent performance issues, while seeing some errors in the JG logs we are not familiar with:

com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.async.AbstractRetryingOperation - Retrying failed call. Failure #1, got: Status{code=UNAVAILABLE, description=HTTP/2 error code: NO_ERROR
Received Goaway load_shed, cause=null} on channel 7.
Trailers: Metadata(bigtable-channel-id=7)

Does anyone have any experience with those kind of issues? This isn't consistent, and can cause some traversals to time out, obviously, naively increasing the timeout is band-aid
i'd like to avoid. Are there any configurations that needs to be done?

Thanks in advance!


Re: Question on Design and Suitability of janus graph

hadoopmarc@...
 

Hi Basanth Gowda,

The fit between your use case and janusgraph does not seem particularly good. The mean reasons for my opinion are:
  1. your datamodel still seems rather simple (website visitors and groups); it could easily be handled in a relational model using SQL
  2. analytical queries are important to you, while this aspect of janusgraph is still in its infancy (you might want to check an OLAP meeting to be planned)
I am curious to see whether other opinions pop up!

Best wishes,    Marc


Re: JanusGraph meetup topic discussion - graph OLAP & algorithms

hadoopmarc@...
 

Hi Ted,

Saw these two interesting threads on the dev list the other day:
https://lists.lfaidata.foundation/g/janusgraph-dev/topic/performance_optimization/80653320
https://lists.lfaidata.foundation/g/janusgraph-dev/topic/performance_issue_large/80821002

Apparently, the people at Zeotab do analytics on janusgraph at a massive scale by having many spark executors individually connect to janusgraph (skipping SparkGraphComputer/HadoopGraph). It would be interesting to have them at the meeting and hear what kind of analytic queries they do, in particular:
  • how do they access the table with janusgraph id's?
  • how do they aggregate the results of individual spark partitions into the end result of the gremlin query?
  • how do they retrieve vertex data for step 2,3,.... of the traversal (spark shuffle vs each executor retrieving additional vertex data from janusgraph)?
Best wishes,    Marc


Question on Design and Suitability of janus graph

Basanth Gowda <basanth.gowda@...>
 


We are embarking on a new initiative and wanted to get community inputs if using Janus Graph is a good fit. Any alternative suggestions also welcome.  

  • New records are added regularly. Assume every visitor to the website
  • A Visitor can become a customer
  • Customer can join a group voluntarily. A group customer joins could be high cardinality or low cardinality. 
  • Customer will be added to group by the system based on characteristics. (For example Age Groups, Male/Female/Other, Country etc...)
  • Customers can move among groups or cease to be part of the group
  • Customers would be part of a group for a given duration. For example they are part of a group when an event is happening.
  • Customer has multiple Unique identifiers to get them by that key (could be customerId, subscriptionId etc)
We are looking at 300 - 400 million entries.

We are expecting decent amount of OLAP requests like :
  • Give me all the customers that belong to a group
  • Give me all customers that belonged to a group but not any more.
  • Give me all customers that a group but also belong to another group.
  • Give me related customers (referrals)

We have Elastic Search, Cassandra  and others being used in our ecosystem

thank you,
BG


Re: Authentication in JanusGraph Server

hadoopmarc@...
 


Re: Authentication in JanusGraph Server

hadoopmarc@...
 

Hi Graham,

This was certainly one to investigate for the weekend. Where you started investigating from the inside of janusgraph, I started from the user perspective and this is what I did:
  1. I replicated your steps on janusgraph-full-0.5.3 and hit the same issue (incorrect username/password)
  2. I also replicated your steps on janusgraph-0.3.2 to be sure no bugs were introduced in later versions, but still the same issue
  3. I checked the old user list and found https://groups.google.com/g/janusgraph-users/c/iVqlUS2zQbc/m/vmf8PgEQBAAJ  This was interesting: someone had problems with the credentialsDb and only got it working after switching from a Berkeleyje backend to an HBase backend. This was a pattern: your issue also was with Berkeleyje
  4. In the authentication section of the gremlin-server.yaml I changed the properties file for the credentialsDb to one using cql-es with a keyspace "credentials" and... remote authentication worked
This was a nasty one, but the effort you had already taken inspired me to do my part. I will make an issue report for this on github.

Best wishes,   Marc


Re: How to circumvent transaction cache?

hadoopmarc@...
 

Hi Timon,

Adding to the answer of Ted, I can imagine that your new data enter your pipeline from a Kafka queue. With a microbatching solution, e.g. Apache Spark streaming, you could pre-shuffle your data per microbatch to be sure that all data relating to a branch are in a single partition. After that, a single thread can handle this single partition in one JanusGraph transaction. This approach seems fit better to your use case that trying to circumvent ACID limits in a tricky way.

Best wishes,    Marc


Re: How to circumvent transaction cache?

Boxuan Li
 

Hi Timon,

As I mentioned earlier, the only way I can think of (assuming you are not concerned about the consistency of data storage as Ted mentioned) is to modify JanusGraph source code:

In CacheVertex class, there is a data structure, protected final Map<SliceQuery, EntryList> queryCache.

What you could do is to add a method to that class:

public void refresh() {
    queryCache.clear();
}

And then you can call refresh before you want to load new value from the storage rather than cache:

((CacheVertex) v1).refresh();

Hope this helps,
Boxuan


On Mar 6, 2021, at 12:32 AM, Ted Wilmes <twilmes@...> wrote:

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.




Re: How to circumvent transaction cache?

Ted Wilmes
 

Hi Timon,
Jumping in late on this one but I wanted to point out that even if you could read it prior to committing to check if your constraint is maintained, most of the JG storage layers do not provide ACID guarantees. FoundationDB is the one distributed option, and BerkeleyDB can do it for a single instance setup. Since you do not have ACID guarantees in most cases, I think you could still have a case where another transaction commits prior to your commit even though you saw isPublished = false when you check it. One possible way around this without ACID would be to process all mutations for a branch on one thread, effectively single threading access to it so that you could know that no other user was writing to the branch while you were reading.

--Ted

On Fri, Mar 5, 2021 at 8:52 AM <timon.schneider@...> wrote:
Thanks for your suggestion, but the consistency setting does not solve my problem.


Re: How to circumvent transaction cache?

timon.schneider@...
 

Thanks for your suggestion, but the consistency setting does not solve my problem.


Re: How to circumvent transaction cache?

Nicolas Trangosi <nicolas.trangosi@...>
 

Hi Simon,
It seems that you can force JG to re-read elements just before commit according to

I have never try the option mgmt.setConsistency but this may help you.

Regards,
Nicolas

Le ven. 5 mars 2021 à 10:20, <timon.schneider@...> a écrit :

[Edited Message Follows]

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A starts transaction and makes edits, does not commit yet
B makes an edit to vertex X and commits
A cannot see B's edit to vertex X unless A commits or rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon



--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   




Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.


Re: How to circumvent transaction cache?

timon.schneider@...
 
Edited

Thanks for your reply.

The issue is that we need to refresh some vertices mid transaction. Rolling back is not an option as that would erase edits that we're making in our transaction. Disabling tranaction cache could be one solution. Using a treaded tx counld be an option as well as that transaction does see edits made by other users, opposed to the original transaction:
A reads vertex X and then starts transaction and makes edits, does not commit yet
B may or may not edit X
A continues editing and before committing it needs to makes sure vertex X was not changed by B or else rolls back.
Again, it is possible to read X by using a ThreadedTx but I'm interested if there's another way to refresh a vertex mid transaction.

Kr,
Timon


Re: Authentication in JanusGraph Server

grahamwallis.dev@...
 

Hi @hadoopmarc,

Thanks for replying and no apology needed - it's a good question. Although I failed to mention it in my question, I did set the credentials to ('graham','sass-password') in the sasl-remote.yaml file when testing with the JanusGraph as credentials store.

Setting a breakpoint in the server I could see the correct credentials being received, and the credentials store traversal looked fine; but no vertex is returned.

All the best
  Graham


Re: how to delete Ghost vertices and ghost edges?

Boxuan Li
 


「<vamsi.lingala@...>」在 2021年3月4日 週四,下午4:42 寫道:

gremlin> g.V(6389762617560).valueMap()
==>{}
gremlin>
gremlin> g.V().hasLabel("MAID").has("madsfid","sfmsdlk").outE("MAIH1").as("e").inV().as("v").select("e", "v").by(valueMap())
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}


Re: How to circumvent transaction cache?

Boxuan Li
 

Hi Timon,

I don’t even think you will be able to disable tx-cache by using createThreadedTx(), or equivalently, newTransaction()/buildTransaction(). Unfortunately, as long as your transaction is not readOnly(), the effective vertex transaction size will be Math.max(100, cache.tx-cache-size).

To my best knowledge, you can only modify JanusGraph source code to completely disable transaction level cache. A workaround would be to always start a new transaction to check whether the value has changed.

Best regards,
Boxuan

「<timon.schneider@...>」在 2021年3月3日 週三,下午9:11 寫道:

Our application has transactions editing many vertices representing elements of a branch. This branch is also represented by a vertex that has boolean property isPublished. Before committing such a transaction, we need to know whether another user set the isPublished property on the branch vertex to true, in which case the transaction should be rolled back.

Here’s the problem:
* User A reads the branch vertex but doesn’t close transaction
* User B changes the isPublished property to true and commits (while A is still making changes)
* User A read locks the vertex with an external locking API
* User A queries the branch vertex again (to make sure isPublished is still false) in the same thread but gets the old values because of the transaction cache.
Now user A can commit data even though the branch isPublished is true.

I know it’s possible to use createThreadedTx() to circumvent the ThreadLocal transaction cache. However, such refreshes will be very common in our application and ideally we would be able to execute a refresh within the main transaction to minimise complexity and workarounds. Is this possible? And if not, are there any possibilities to turn off transaction cache entirely?

Thanks in advance,
Timon


how to delete Ghost vertices and ghost edges?

vamsi.lingala@...
 

gremlin> g.V(6389762617560).valueMap()
==>{}
gremlin>
gremlin> g.V().hasLabel("MAID").has("madsfid","sfmsdlk").outE("MAIH1").as("e").inV().as("v").select("e", "v").by(valueMap())
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}
==>{e={}, v={}}


Re: Gremlin Query to return count for nodes and edges

Vinayak Bali
 

Hi Marc,

The backend used is Cassandra. I was just wondering if we can load the data from Cassandra's data store to the in-memory backend to speed up the process.
I tried OLAP by configuring Hadoop and Spark with the help of references shared in the documentation. A simple query to retrieve 1 node from the graph took around 5 mins. 
Based on your experience, request to share the steps to be followed to solve the issue.

Thanks & Regards,
Vinayak

On Wed, Feb 24, 2021 at 9:32 PM <hadoopmarc@...> wrote:
Hi Vinayak,

Speeding up your query depends on your setup. 15.000 vertices/second is already fast. Is this the janusgraph inmemory backend? Or ScyllaDB?

In a perfect world, not there yet, your query would profit from parallelization (OLAP). JanusGraph supports both the withComputer() and withComputer(SparkGraphComputer) start steps, but the former is undocumented and the performance gains of the latter are often disappointing.

Best wishes,    Marc


Re: Authentication in JanusGraph Server

hadoopmarc@...
 

Sorry for asking, but you did not state it explicitly: you did modify your sasl-remote.yaml file to reflect the new ('graham', 'sasl-password') credentials, did you?

Marc