Date   

Integrate CustomVertexProgram to janusgraph

Nikita Pande
 
Edited

What is the current method of integrating new VertexProgram as part of janusgraph. Is it getting the code in tinker pop and then building janusgraph code. Is java only supported language?


Re: Kerberos authentication of gremlin console with Janusgraph server

hadoopmarc@...
 

Kerberos has a reputation for being complex. I would try to first get the pure TinkerPop example working, using the TInkerPop Gremlin-server and Gremlin Console distributions. Also check the log output of Gremlin Server in case of exceptions in Gremlin Console. The command graph = JanusGraphFactory.open('') is not the best example to start with in Gremlin Console. Better is g.V().limit(5).


Re: Kerberos authentication of gremlin console with Janusgraph server

Nikita Pande
 
Edited

Hi Marc,

In my case it's both, gremlin acts as client to kerberised hbase and gremlin acts as kerberised server to gremlin console/clients. Also I have already tested hbase separately along with janus, it works fine. Now I want to add kerberized authentication of janusserver on top of this. So I want gremlin console to get authenticated

Thanks,
Nikita


Re: Kerberos authentication of gremlin console with Janusgraph server

hadoopmarc@...
 

You are mixing up two procedures:
  1. Gremlin Server Krb5Authenticator is for authenticating gremlin clients towards Gremlin Server. Apparently, you do not want it, so remove it from your configs.
  2. Apparently you are trying to have Gremlin Server authenticate againts HBase. This has nothing to do with Gremlin Server's Krb5Authenticate. If the keytab for Gremlin Server is OK and a kinit was done on the Gremlin Server host with the right user, the hbase client of janusgraph-hbase, running on the Gremlin Server host, should be able to access the TGT and authenticate to HBase.

Best wishes,     Marc


Re: Kerberos authentication of gremlin console with Janusgraph server

Nikita Pande
 
Edited

Thanks for recommending this approach. However, I am getting following error:
when running gremlin> def list = client.submit("g.V()").all().get()
>>> CCacheInputStream: readFlags()
get normal credential
org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failure to initialize security context

Also similarly when earlier I was running, I am getting inconsistent response:
1.  :remote connect tinkerpop.server conf/remote.yaml
2. :remote console
3.  graph=JanusGraphFactory.open("/root/janusgraph-0.6.0/conf/janusgraph-hbase.properties"), sometimes works fine  and returns configured graph. However sometimes when I repeat 1,2. It gives error "Failure to initialize security context"
 


Re: MongoDB or ElasticSearch as storage backend?

hadoopmarc@...
 

Adding new types of storage or indexing backends to JanusGraph is not straightforward. So, unless you are a seasoned java developer and have some time to spare, you will have to use the available storage and indexing backends, as listed in https://docs.janusgraph.org/.

I am not aware of any out-of-the-box/cloud solutions to transfer data from Mongodb to JanusGraph.

Best wishes,    Marc


MongoDB or ElasticSearch as storage backend?

ucheozoemena@...
 

Hi everyone, is it possible to use mongodb or elastic search as the storage backend? I'm new to janusgraph so please bear with me and feel free to explain anything obvious that I may be missing. It appears Mongodb and janusgraph are entirely different types of databases but I’m just wondering if there’s a known way to make janusgraph work with data that’s already stored in mongodb. Elastic search is a common indexing backend for janusgraph, and is commonly used with mongodb as well. So I’m considering if it’s possible to use all 3 together. If mongodb can't be used, how about elastic search as the storage backend as well as the indexing backend?


Re: Kerberos authentication of gremlin console with Janusgraph server

hadoopmarc@...
 

Connecting gremlin console to gremlin server goes like:

cluster = Cluster.build(<hostname>).jaasEntry(<entry in gremlin-jaas.conf file>).protocol(<serverPrincipalName>).create()
and see https://tinkerpop.apache.org/docs/current/reference/#connecting-via-drivers how to use the cluster object.

Did you try this already?


Re: Kerberos authentication of gremlin console with Janusgraph server

Nikita Pande
 

" Krb5Authenticator runs inside Gremlin Server and authenticates users of gremlin clients (e.g. Gremlin Console). " this is configured as part of gremlin-server.yaml. 
When I run just "graph" alone  from gremlin-console, I get error "Authenticator is not ready to handle requests".
Whereas when I run JanusGraphFactory it passes. Not sure why do we get this. My current service keytab had 2 principals and I configured one. Will it create problem?


Re: Kerberos authentication of gremlin console with Janusgraph server

hadoopmarc@...
 

Hi Nikita,

I do not understand: Krb5Authenticator runs inside Gremlin Server and authenticates users of gremlin clients (e.g. Gremlin Console). Why would you run JanusGraphFactory in the Gremlin Console if the graph is already opened server side?

Can you also check the logs of Gremlin Server and see if they give any additinal hint about Krb5Authenticator?

Best wishes,   Marc


Kerberos authentication of gremlin console with Janusgraph server

Nikita Pande
 
Edited

Hi team,
 
Kerberos authentication of gremlin console with janusgraph version 0.6.0
 
I am facing an issue when trying to configure kerberos auth of gremlin console with with janus as per https://tinkerpop.apache.org/docs/current/reference/#krb5authenticator. Currently after kinit , I try to start gremlin console and run some traversals. So sometimes I get "Authenticator is not ready to handle requests" while sometimes it goes through while running command: graph = JanusGraphFactory.open('') Its very inconsistent. Please help me in resolving this.
 
Thanks and Regards, 
Nikita


Re: Gremlin giving stale response

Aman <amandeep.srivastava1996@...>
 

Hi Marc,

Using read-only strategy is a really good suggesion. Let me try that, thanks!

Regards,
Aman


On Tue, 15 Mar, 2022, 12:45 pm , <hadoopmarc@...> wrote:
Hi Aman,

Without saying that I understand everything of the JanusGraph behaviour that you describe, I can add the following:
  • when accessing the graph via JanusGraph server (what you call the gremlin API) every request is a transaction already (unless you use sessions). So, using buildTransaction for defining g in JanusGraph Server is counterintuitive and possibly enters untested territory.
  • Apache TinkerPop has the so-called ReadOnlyStrategy to realize the behaviour you want. Can you try that instead?
Best wishes,    Marc


Re: Gremlin giving stale response

hadoopmarc@...
 

Hi Aman,

Without saying that I understand everything of the JanusGraph behaviour that you describe, I can add the following:
  • when accessing the graph via JanusGraph server (what you call the gremlin API) every request is a transaction already (unless you use sessions). So, using buildTransaction for defining g in JanusGraph Server is counterintuitive and possibly enters untested territory.
  • Apache TinkerPop has the so-called ReadOnlyStrategy to realize the behaviour you want. Can you try that instead?
Best wishes,    Marc


Re: JanusGraph database cache on distributed setup

Boxuan Li
 

Hi Wasantha,

It's great to hear that you have solved the previous problem.

Regarding contributing to the community, I would suggest you create a GitHub issue first, describing the problem and your approach there. This is not required but recommended. Then, you could create a pull request linking to that issue (note that you would also be asked to sign an individual CLA or corporate CLA once you create your first pull request in JanusGraph).

I am not 100% sure if I understand your question, but I guess you are asking what exactly is stored in that cache. Basically, that cache stores the raw data fetched from the storage backend. It does not have to be vertex - it could be vertex properties or edges. It might also contain deserialized data (See getCache() method in Entry.java). Note that in your case, since your cache is not local, it might be a better idea to store only the raw data but not the deserialized data to reduce the network overhead. To achieve that, you could override Entry::setCache method and let it do nothing. If you are interested in learning more about the "raw data", I wrote a blog Data layout in JanusGraph that you might be interested in.

Hope this helps.
Boxuan


Gremlin giving stale response

Aman <amandeep.srivastava1996@...>
 

Hi,

I've built a service around JanusGraph to ingest/retrieve data. When I ingest a new edge, the same is not reflected on the gremlin endpoint of JG. However, ingested vertices count is updated correctly.

Here's what I'm doing:
1. Use API to create 2 new vertices
2. Use API to create a new edge
3. Hit gremlin API to get vertex count -> shows 2 correctly
4. Hit gremlin API to get edge count -> shows 0

When I restart gremlin server, I'm able to see correct edge count via gremlin API (i.e. 1 in above example). My hunch is that gremlin API is using a stale transaction somewhere, hence returning incorrect data.

I tried setting 2 values of g in empty-sample.groovy:
1. globals << [g: graph.traversal()] -> This seems to give the correct number of vertices and edges always, so no issues with this one.
2. globals << [g: graph.buildTransaction().readOnly().start().traversal()] -> This seems to be having the above mentioned inconsistency issue.

I want the gremlin API to be read-only so all ingestions happen via my custom built APIs (I validate a few things before persisting data). Had following questions:

1. Why does the second value of g cause inconsistent results only for edges? If it's continuing on the same transaction, shouldn't staleness disrupt vertex count results too?
2. How can I set g value such that traversals are always readOnly and every request starts a new transaction, ending it after response has been returned to the clients.

Would appreciate it if the group can provide inputs.

Regards,
Aman


Re: JanusGraph database cache on distributed setup

washerath@...
 

Hi Boxuan,

We were able to overcome sync issue on multiple JG instance after doing modifications on void invalidate(StaticBuffer key, List<CachableStaticBuffer> entries) as suggested. We had few performance issues to resolve which took considerable effort.

i am happy to contribute the implementation to community since it's almost done. Please guide on that. 

One another question on cache :

private final Cache<KeySliceQuery,EntryList> cache;

As per above initialization on cache object it persists only the specific query against result of that query. It does not caches all the traversal vertices. Is my understanding correct ?

Thanks
Wasantha


Connective predicate with id throws an IllegalArgumentException (Invalid condition)

toom@...
 

Hello,
 
If a query contains a connective predicate with string ids, it fails with an IllegalArgumentException:
 
gremlin> g.V().or(has(T.id, "1933504"), has(T.id, "2265280"), has(T.id, "2027592")) 
java.lang.IllegalArgumentException: Invalid condition: [2265280, 2027592, 1933504]                               
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)                  
        at org.janusgraph.graphdb.query.graph.GraphCentricQueryBuilder.has(GraphCentricQueryBuilder.java:148)
        at org.janusgraph.graphdb.query.graph.GraphCentricQueryBuilder.has(GraphCentricQueryBuilder.java:67)
        at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.addConstraint(JanusGraphStep.java:168)
        at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.buildGlobalGraphCentricQuery(JanusGraphStep.java:156)
        at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.buildGlobalGraphCentricQuery(JanusGraphStep.java:131)
 
The condition is invalid because the predicate value is not a List [1] but a HashSet. The value is changed to a HashSet by HasContainer because it detects an id strings [2].
 
The same query with numerical id works.
 
I think the value of a connective predicate could be a Collection. The "isValidCondition" could check on this super class.
 
Regards,
 
Toom.
 
[1] https://github.com/JanusGraph/janusgraph/blob/v0.6.1/janusgraph-core/src/main/java/org/janusgraph/graphdb/predicate/ConnectiveJanusPredicate.java#L45-L47
[2] https://github.com/apache/tinkerpop/blob/3.5.1/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/util/HasContainer.java#L69-L72


Re: Janusgraph 0.6.0 cassandra connection issues caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=5960d2ce): [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [JanusGraph Session|control|connecting...]

Boxuan Li
 

Hi Krishna,

Sorry for the late reply. Can you try one thing:

storage.hostname = cass01,cass01

See if this config works or not. If this works, then likely the problem is not with JanusGraph. Otherwise, there might be a bug, and it would be helpful if you could provide your complete configuration and steps.

You might also want to try the last stable version, 0.5.3, and see if you have the same issue.

Best,
Boxuan

On Mar 7, 2022, at 5:50 AM, krishna.sailesh2@... wrote:

Hi hadoopmarc

It's a typo in the post, I had just typed that people can understand I am passing it as a string.
in acutal code, i am using property file and reading those properties in configuration2 object and passing to janusgraph.

Thanks
Krishna Jalla


Re: JanusGraph Best Practice to Store the Data

Boxuan Li
 

Hi,

There are a few factors you might want to consider:

1. An increase of your transaction-wise cache and database-level cache memory usage.
2. Cassandra does not support large column value well. 100-500kb is far less than the hard limit, but some say that this scale can also lead to performance issue (disclaimer: I’ve never tried it myself).
3. Serialization and deserialization cost. To reduce storage and network overhead, JanusGraph encodes and compresses your string value (see StringSerializer). That being said, I believe this overhead should (usually) still be much smaller than an additional network call (if you store docValue somewhere else).

The best option depends on your use case and your testing, of course.

Best,
Boxuan


On Mar 9, 2022, at 8:22 AM, kaintharinder@... wrote:

[Edited Message Follows]

Hi Team,

We are running JanusGraph + Cassandra combination for storing through Gremlin Commands from Java Api.
Thinking of saving the full JSON document into Graph alongside relationship.

Gremlin Query is Like Below :
g.addV('Segment').property(\"docId\",docId).property(\"docValue\",docValue).property(\"docSize\",docSize)

The "docValue" value will be huge lying in the range of 100-500kb. It is a JSON document.
Wanted to understand whether it is a good practice to save full documents in Graph or should we only store the references.


JanusGraph Best Practice to Store the Data

kaintharinder@...
 
Edited

Hi Team,

We are running JanusGraph + Cassandra combination for storing through Gremlin Commands from Java Api.
Thinking of saving the full JSON document into Graph alongside relationship.

Gremlin Query is Like Below :
g.addV('Segment').property(\"docId\",docId).property(\"docValue\",docValue).property(\"docSize\",docSize)

The "docValue" value will be huge lying in the range of 100-500kb. It is a JSON document.
Wanted to understand whether it is a good practice to save full documents in Graph or should we only store the references.

201 - 220 of 6661