Date   

Re: Coalesce() step behaves differently than Or() step, with exception in sum() step

hadoopmarc@...
 

This question is also asked to the gremlin user group, where it can be best answered:

https://groups.google.com/g/gremlin-users/c/-oBWRUxF_Hw


Re: Janusgraph connect with MySQL Storage Backend

Madhan Neethiraj
 

Boxuan,

 

Thanks for the pointer to janusbench. I will try this and update this thread in few days.

 

Madhan

 

 

From: <janusgraph-users@...> on behalf of BO XUAN LI <liboxuan@...>
Reply-To: <janusgraph-users@...>
Date: Monday, February 8, 2021 at 2:46 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend

 

Hi Madhan,

 

Have you checked out https://github.com/rngcntr/janusbench ? I never used it personally but it looks interesting and might be helpful.

 

Best regards,

Boxuan



On Feb 8, 2021, at 4:04 PM, Madhan Neethiraj <madhan@...> wrote:

 

Hi Marc,

 

Thanks! This is a work-in-progress implementation, and will need to go through more testing - especially to understand the performance and tuning aspects. Once these are done, this feature can be announced via a blog and/or other channels.

 

Are there JanusGraph tests available to cover performance aspects of backend storage implementations? It will be of big help.

 

Thanks,

Madhan

 

From: <janusgraph-users@...> on behalf of <hadoopmarc@...>
Reply-To: <janusgraph-users@...>
Date: Sunday, February 7, 2021 at 3:10 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend

 

Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc 

 


Coalesce() step behaves differently than Or() step, with exception in sum() step

cmilowka
 

I am trying to replace coalesce() by or() which is generally faster, but there is a problem with or() step failing in the following mum() step:
 
gremlin> graph3 = TinkerGraph.open()                                              ==>tinkergraph[vertices:0 edges:0]
gremlin> graph3.io(graphml()).readGraph('data/grateful-dead.xml')  ==>null
gremlin> g3 = graph3.traversal()                                                        ==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> g3.V('89').values("performances")                                       ==>219
gremlin> g3.V('89').values("performances").sum()                             ==>219
gremlin> g3.V('89').or(values("performances")).sum()                       ==> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Number
gremlin> g3.V('89').or(values("performances"),__.constant("10000")).sum()  ==> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Number
gremlin> g3.V('89').coalesce(values("performances"),__.constant("10000")).sum()  ==>219
 
Is an error, or wrong doing?
CM
 


Re: Where does Computation happens

hadoopmarc@...
 

Hi Dany,

Some anwers:
  1. computation happens inside janusgraph (not the storage backend) and janusgraph runs as part of gremlin server
  2. yes, a query and its computations run on a single gremlin server instance
  3. None. If you run a g.V().count(), that is a full table scan, gremlin server will not run out of memory, but if you have billions of vertices the query will take days. For the kinds of workloads that you worry about, JanusGraph has some initial support for OLAP type operations in which all data are loaded by a spark cluster and computation results are returned to the janusgraph instance.
Best wishes,   Marc


Where does Computation happens

Dany <danyinb@...>
 

Hi Team,

I have a clarification on a distributed query execution, Have JanusGraph setup with cassandra distributed storage. I am worried about the performance of complex queries.

1. Where does my query computation happen? Is it in the JanusGraph Gremlin server or on the distributed storage?
2. If the execution is on a gremlin server, does all the computations happen on a single gremlin instance or distributed query?
3. What is the max traversal of vertices and edges that can be handled by janusgraph server? eg. counts?

--
Regards,

Dany


Re: Janusgraph connect with MySQL Storage Backend

Boxuan Li
 

Hi Madhan,

Have you checked out https://github.com/rngcntr/janusbench ? I never used it personally but it looks interesting and might be helpful.

Best regards,
Boxuan

On Feb 8, 2021, at 4:04 PM, Madhan Neethiraj <madhan@...> wrote:

Hi Marc,
 
Thanks! This is a work-in-progress implementation, and will need to go through more testing - especially to understand the performance and tuning aspects. Once these are done, this feature can be announced via a blog and/or other channels.
 
Are there JanusGraph tests available to cover performance aspects of backend storage implementations? It will be of big help.
 
Thanks,
Madhan
 
From: <janusgraph-users@...> on behalf of <hadoopmarc@...>
Reply-To: <janusgraph-users@...>
Date: Sunday, February 7, 2021 at 3:10 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend
 
Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc 



Re: Janusgraph connect with MySQL Storage Backend

Madhan Neethiraj
 

Hi Marc,

 

Thanks! This is a work-in-progress implementation, and will need to go through more testing - especially to understand the performance and tuning aspects. Once these are done, this feature can be announced via a blog and/or other channels.

 

Are there JanusGraph tests available to cover performance aspects of backend storage implementations? It will be of big help.

 

Thanks,

Madhan

 

From: <janusgraph-users@...> on behalf of <hadoopmarc@...>
Reply-To: <janusgraph-users@...>
Date: Sunday, February 7, 2021 at 3:10 AM
To: <janusgraph-users@...>
Subject: Re: [janusgraph-users] Janusgraph connect with MySQL Storage Backend

 

Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc


Re: Janusgraph connect with MySQL Storage Backend

hadoopmarc@...
 

Hi Madhan

It is exciting news that you have just made available this work under the APL2.0 license!!! Standard RDBMS, though not as scalable as the NoSQL storage backends, might provide an easy start for many future JanusGraph projects because any devops team can have one available in the blink of an eye. Of course, this announcement deserves a more prominent spot than just this outpost of the JanusGraph community. Do you have any plans to announce this in more detail in a future blog post or conference contribution? In particular, it would be interesting to hear about the tuning of postgresql for the JanusGraph workloads (many small requests with a need for small round trip delays).

Best wishes,     Marc


Re: How to improve the write speed of Java connection janusgraph?

hadoopmarc@...
 

This blog might be a good start for your further experiments:

https://www.experoinc.com/post/janusgraph-nuts-and-bolts-part-1-write-performance

Best wishes,   Marc


Re: Janusgraph connect with MySQL Storage Backend

Madhan Neethiraj
 

Hi Molong,

 

An implementation of JanusGraph backend storage for RDBMS is available at https://github.com/mneethiraj/janusgraph/tree/rdbms_backend. The implementation uses EclipseLink JPA to access RDBMS, which enables support for large number of database flavors – including Postgres, MySQL, Oracle, MS-SQL. A brief document on RDBMS storage backend configuration is available here. I tested with Postgres, it works well; and should work with other RDBMS flavors supported by EclipseLink as well.

 

Hope you find this useful.

 

Madhan

 


Re: How to improve the write speed of Java connection janusgraph?

ramosrods@...
 

I’m also trying to running theses parameters to process traversals in parallel using a thread pool. The most interesting parameters I’ve found until now are maxInProcessRequest and maxWaitForConnection. Also find this StackOverflow question helpful: https://stackoverflow.com/questions/41639616/titan-parallel-queries-concurrent-time-out-exception-at-org-apache-tinkerpop-g


Re: Janusgraph 0.5.3 potential memory leak

owner.mad.epa@...
 

I run simple benchmark that reproduce oom problem with iterator version

https://gist.github.com/mad/df729c6a27a7ed224820cdd27209bade

Result

Benchmark                               (size)  (valueSize)   Mode  Cnt     Score     Error  Units
StaticArrayEntryListBenchmark.iterable   10000           50  thrpt    5  2738.330 ± 151.820  ops/s
StaticArrayEntryListBenchmark.iterable   10000         1000  thrpt    5   430.655 ±  34.286  ops/s
StaticArrayEntryListBenchmark.iterable   10000         5000  thrpt    5   116.830 ±   7.664  ops/s
StaticArrayEntryListBenchmark.iterable  100000           50  thrpt    5   206.853 ±  36.894  ops/s
StaticArrayEntryListBenchmark.iterable  100000         1000  thrpt    5    43.632 ±   1.952  ops/s
StaticArrayEntryListBenchmark.iterable  100000         5000  thrpt    5    12.148 ±   0.444  ops/s
StaticArrayEntryListBenchmark.iterator   10000           50  thrpt    5  1447.668 ± 484.155  ops/s
StaticArrayEntryListBenchmark.iterator   10000         1000  thrpt    5   157.839 ±  17.818  ops/s
StaticArrayEntryListBenchmark.iterator   10000         5000  thrpt    5    31.548 ±  10.991  ops/s
StaticArrayEntryListBenchmark.iterator  100000           50  thrpt    5   177.756 ±   4.327  ops/s
StaticArrayEntryListBenchmark.iterator  100000         1000  thrpt    5    25.456 ±   0.736  ops/s
StaticArrayEntryListBenchmark.iterator  100000         5000  java.lang.OutOfMemoryError: Java heap space


Re: No results returned with duplicate Has steps in a vertex-search traversal

Boxuan Li
 

Just created a simple test case but couldn’t reproduce:

@Test
public void testDuplicateMixedIndexQuery() {
final PropertyKey name = makeKey("name", String.class);
final PropertyKey prop = makeKey("prop", String.class);
mgmt.buildIndex("mixed", Vertex.class).addKey(name, Mapping.STRING.asParameter()).buildMixedIndex(INDEX);
mgmt.buildIndex("mixed2", Vertex.class).addKey(prop, Mapping.STRING.asParameter()).buildMixedIndex(INDEX);
finishSchema();

tx.addVertex("name", "bob", "prop", "val");
tx.commit();

clopen(option(FORCE_INDEX_USAGE), true);
newTx();
assertTrue(tx.traversal().V().has("prop", "val").has("name", P.within("bob","alice")).hasNext());
assertTrue(tx.traversal().V().has("prop", "val").has("name", P.within("bob","alice")).has("name", P.within("bob","alice")).hasNext());
}

Would it be possible for you to narrow down the scope, e.g. removing other “has” steps in your query? It would be helpful if you could write a piece of minimal reproducible test code.

On Feb 4, 2021, at 3:43 AM, Patrick Streifel <prstreifel@...> wrote:

Hi,

We have a mixed Elasticsearch index that indexes every vertex property on our graph. 
For the above example, all of the fields are indexed as a "keyword" string in Elasticsearch. 
We sometimes get inconsistent behavior. For example if instead of querying by two keyword fields (FullName and PersonSurName), I instead query by one field with a "keyword" mapping type and another with a "text" mapping type, I get the desired result back. 
Again, I can see in the logs the correct records being returned by our Elasticsearch index, regardless of the example. It's just not getting returned by JanusGraph after that.
We have no composite indices on our graph. 

Thanks!


Re: Authentication All the Schema's

Vinayak Bali
 

Hi Marc,

Thank You for the update. Authentication to the graph system as a whole is also not working for me for the configurations shared earlier. If I don't pass credentials then still API is returning the results. Authentication to the graph system as a whole will also work for now till the future versions are released. Request you to guide me to accomplish it.

Thanks & Regards,
Vinayak

On Fri, Feb 5, 2021 at 1:31 PM <hadoopmarc@...> wrote:
Hi Vinayak,

No, this is not possible. TinkerPop/JanusGraph currently only support authentication to the graph system as a whole and do not support authorization. Later this year, the Apache TinkerPop 3.5.0 release will offer authorization, though, which will then also become available through a future JanusGraph release.

https://github.com/apache/tinkerpop/commit/61f7b8c08ac6a1232b460e100b3ff7c91ab4142d

Until then, you will have to use separate Gremlin Server instances.

Best wishes,    Marc


Re: Authentication All the Schema's

hadoopmarc@...
 

Hi Vinayak,

No, this is not possible. TinkerPop/JanusGraph currently only support authentication to the graph system as a whole and do not support authorization. Later this year, the Apache TinkerPop 3.5.0 release will offer authorization, though, which will then also become available through a future JanusGraph release.

https://github.com/apache/tinkerpop/commit/61f7b8c08ac6a1232b460e100b3ff7c91ab4142d

Until then, you will have to use separate Gremlin Server instances.

Best wishes,    Marc


Re: Open instances

rngcntr
 

We once had a short discussion about that over on the old mailing list. Sadly, we did not find a quick solution there. However, seeing more people at least confirms that it does not only occur due to a misconfiguration on our side.


Authentication All the Schema's

Vinayak Bali
 

Hi,

Working on a web application using Janusgraph. We are connecting to Janusgraph using API(Java). Need to configure authentication to all the schema in use. Configured the authentication by using the following document for reference.
I authentication is not working, we are getting a blank array as output using the API.
Property files are as follows:

gremlin-server.yaml

# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
 graph: conf/graph.properties,
 graph1: conf/graph1.properties,
 graph2: conf/graph2.properties
}
authentication: {
  authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator,
  authenticationHandler: org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler,
  config: {
    defaultUsername: user,
    defaultPassword: password,
    hmacSecret: secret,
    credentialsDb: conf/janusgraph-credentials-server.properties
  }
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: []}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536

janusgraph-credentials-server.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=127.0.0.1
storage.cql.keyspace=authentication
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

empty-sample.groovy

// Copyright 2019 JanusGraph Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]

// defines a sample LifeCycleHook that prints some output to the Gremlin Server console.
// note that the name of the key in the "global" map is unimportant.
globals << [hook : [
        onStartUp: { ctx ->
            ctx.logger.info("Executed once at startup of Gremlin Server.")
        },
        onShutDown: { ctx ->
            ctx.logger.info("Executed once at shutdown of Gremlin Server.")
        }
] as LifeCycleHook]

// define the default TraversalSource to bind queries to - this one will be named "g".
graph=JanusGraphFactory.open('conf/graph.properties')
graph1=JanusGraphFactory.open('conf/graph1.properties')
graph2=JanusGraphFactory.open('conf/graph2.properties')
globals << [ g : graph.traversal(), g1 : graph1.traversal(), g2:graph2.traversal() ]

I need to secure each and every schema. 
For example: Consider A,B,C,D,E,F  as users and graph, graph1 and graph2 as 3 schemas.
Then,
A has access to the graph,graph1
B only graph
Call the schema.
so on for all the users.
Request you to share your experience and Feedback.

Thanks & Regards,
Vinayak


Re: Issues with controlling partitions when using Apache Spark

Mladen Marović
 

Just a quick info: I opened an issue for this and did some additional research. See https://github.com/JanusGraph/janusgraph/issues/2420 for more details.


Re: reindex job is very slow on ElasticSearch and BigTable

hadoopmarc@...
 

Hi,

No, MixedIndex should be fine. Can you show the code lines that define the index for the maid property key? Possibly, the index is restricted to a specific label so you have to query:
g.V().has('product', 'maid', '45324j5nu8g5r83q89u53h89g')

See also:
https://docs.janusgraph.org/index-management/index-performance/#label-constraint

Best wishes,    Marc


Re: Open instances

hadoopmarc@...
 

Hi,

Good that you noticed this!
Some explanation and the cure can be found at:

https://docs.janusgraph.org/advanced-topics/recovery/#janusgraph-instance-failure

Best wishes,    Marc