Date   

Any advice on performance concern of JanusGraph with Cassandra&Elastic Search?

hazalkecoglu@...
 

Hi everyone,
We are working on a project that we would like to use JanusGraph. Our system will consist of 100MM nodes and 1B edges between those nodes.
We are going to work with last 90 days data.System needs to work like uploading 1B edges and deleting last 90 days of everyday.
So far we tried to experience JanusGraph with Cassandra and ElasticSearch.
 
We want to learn about your experiences and also contribute with ours during the project.
Is there anyone who worked with that huge volume of data? What should be our concerns when to work with that kind of big data?
 
Also what will be the best and fast approach of uploading 1B edges everyday?
Thanks a lot
 


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

rngcntr
 

If your use case can handle the downtime, stopping writes and waiting until all changes are propagated to both the storage and the index backend sounds like a viable solution. However, I have no idea about the order of magnitude of the necessary downtime.


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

florian.caesar
 

Yeah, good point, it's a bit hairy. Having potentially inconsistent index backups makes them much less attractive. Though I guess I could run a reindex job on just the delta since last Scylla write time and last ES write time.
As a simpler alternative, how about pausing write transactions for say ~1s and initiating simultaneous backups of my Scylla and ES clusters during that time?
From what I can tell, both backup mechanisms guarantee snapshot isolation. A short write pause should ensure that all writes have propagated.
What caveats do you see with this approach?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, May 3rd, 2021 at 9:49 AM, rngcntr <florian.grieskamp@...> wrote:

Although the solution presented by Marc is also the closest to a consistent backup that I can think of, there are obviously caveats to it. Updates of values which were written after the time of the Scylla snapshot could be present in ES, corrupting the state of the index. Therefore, checking the pure existence of a vertex in Scylla may not be sophisticated enough to guarantee a consistent state. Verifying the property values explicitly can be helpful here, but that still leaves us with the question how to handle mismatches of this kind.
Just keep that in mind when using such a backup strategy in your environment.

Best regards,
Florian


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

rngcntr
 

Although the solution presented by Marc is also the closest to a consistent backup that I can think of, there are obviously caveats to it. Updates of values which were written after the time of the Scylla snapshot could be present in ES, corrupting the state of the index. Therefore, checking the pure existence of a vertex in Scylla may not be sophisticated enough to guarantee a consistent state. Verifying the property values explicitly can be helpful here, but that still leaves us with the question how to handle mismatches of this kind.
Just keep that in mind when using such a backup strategy in your environment.

Best regards,
Florian


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

florian.caesar
 

Awesome, yes, that's very similar to what I was planning!
It's not perfect and definitely needs to tested thoroughly, but it should be much faster and reasonably scriptable.
I'll let you all know how it goes when I get to setting this up.. hopefully won't be long, a decade or so at most.

Thanks!

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, May 3rd, 2021 at 9:06 AM, <hadoopmarc@...> wrote:
In theory (not used in practice) the following should be possible:

  1. make a snapshot of the ScyllaDB keyspace
  2. after the ScyllaDB snapshot is written, make a snapshot of corresponding ES mixed indices
  3. restore all snapshots on separate temporary clusters (doing this manually on a production cluster is a no-go)
  4. find the latest writetime in the ScyllaDB snapshot
  5. try all ES index items later than this timestamp and remove them if the corresponding vertices cannot be retrieved from ScyllaDB
  6. make a new snapshot of the ES mixed indices
This is rather cumbersome, of course, but it would allow for a fast restore of consistent indices (this does not deal with the other issue, the partially succeeded transactions).

Best wishes,   Marc


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

hadoopmarc@...
 

In theory (not used in practice) the following should be possible:
  1. make a snapshot of the ScyllaDB keyspace
  2. after the ScyllaDB snapshot is written, make a snapshot of corresponding ES mixed indices
  3. restore all snapshots on separate temporary clusters (doing this manually on a production cluster is a no-go)
  4. find the latest writetime in the ScyllaDB snapshot
  5. try all ES index items later than this timestamp and remove them if the corresponding vertices cannot be retrieved from ScyllaDB
  6. make a new snapshot of the ES mixed indices
This is rather cumbersome, of course, but it would allow for a fast restore of consistent indices (this does not deal with the other issue, the partially succeeded transactions).

Best wishes,   Marc


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

florian.caesar
 

Thanks again. Yeah, might end up doing that, but it seems like a complicated solution.. hmm.

Regarding the feature request, I'll dig into the code and ask around the janusgraph-dev group :)


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

Boxuan Li
 

Yeah, reindexing can be slow in that case. You could try the transaction recovery mechanism as described in https://docs.janusgraph.org/advanced-topics/recovery/#transaction-failure which makes use of write-ahead log and requires a process dedicated to run transaction recovery continuously.

> Can I somehow make Janusgraph respect that and fail the transaction if it can't persist to the indexing backend

Unfortunately no. That seems to be a legitimate requirement and even I don’t know why this is not allowed at the moment. You may want to raise a feature request on GitHub, and/or fork JanusGraph and apply that change.

On May 1, 2021, at 10:16 PM, florian.caesar via lists.lfaidata.foundation <florian.caesar=protonmail.com@...> wrote:

Hi Boxuan, 

thank you for the detailed response.

What if inconsistency between my primary storage and my indexing backend is not tolerable? Can I somehow make Janusgraph respect that and fail the transaction if it can't persist to the indexing backend? 

As for the recovery options, reindexing seems reasonable. Though I'm worried that reindexing all mixed indices that way will be very slow for large graphs with many mixed indices. 

Florian




-------- Original Message --------
On 1 May 2021, 13:46, Boxuan Li < liboxuan@...> wrote:

Hi Florian,

JanusGraph's philosophy is that your primary storage (ScyllaDB in your case) is the primary and authoritative source of truth, and inconsistency between your mixed index backend and storage layer is tolerable. For example, your transaction would succeed if data is persisted successfully in your primary storage but not the mixed index backend. To fix the inconsistency, you could periodically run the reindex OLAP job, and you could set up the transaction recovery process as described in https://docs.janusgraph.org/advanced-topics/recovery/#transaction-failure.

For your use case, I would suggest running reindex job after you restore data.

Cheers,
Boxuan


Re: Configured graph factory not working after making changes to gremlin-server.yaml

hadoopmarc@...
 

Hi Sai,

In your last post a line with ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));  is missing.

A complete working transcript that works out of the box from the janusgraph-full-0.5.3:

Terminal1
bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration-inmemory.yaml

Terminal2
bin/gremlin.sh

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[96f366a4-9255-488a-b891-134df4a5f8a6]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[96f366a4-9255-488a-b891-134df4a5f8a6] - type ':remote console' to return to local mode
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "inmemory");
==>null
gremlin> map.put("graph.graphname", "graph1");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
==>null
gremlin> graph1 = ConfiguredGraphFactory.open("graph1");
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> g1 = graph1.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin>

Best wishes,    Marc


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

florian.caesar
 

Hi Boxuan,

thank you for the detailed response.

What if inconsistency between my primary storage and my indexing backend is not tolerable? Can I somehow make Janusgraph respect that and fail the transaction if it can't persist to the indexing backend?

As for the recovery options, reindexing seems reasonable. Though I'm worried that reindexing all mixed indices that way will be very slow for large graphs with many mixed indices.

Florian




-------- Original Message --------
On 1 May 2021, 13:46, Boxuan Li < liboxuan@...> wrote:

Hi Florian,

JanusGraph's philosophy is that your primary storage (ScyllaDB in your case) is the primary and authoritative source of truth, and inconsistency between your mixed index backend and storage layer is tolerable. For example, your transaction would succeed if data is persisted successfully in your primary storage but not the mixed index backend. To fix the inconsistency, you could periodically run the reindex OLAP job, and you could set up the transaction recovery process as described in https://docs.janusgraph.org/advanced-topics/recovery/#transaction-failure.

For your use case, I would suggest running reindex job after you restore data.

Cheers,
Boxuan


Re: Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

Boxuan Li
 

Hi Florian,

JanusGraph's philosophy is that your primary storage (ScyllaDB in your case) is the primary and authoritative source of truth, and inconsistency between your mixed index backend and storage layer is tolerable. For example, your transaction would succeed if data is persisted successfully in your primary storage but not the mixed index backend. To fix the inconsistency, you could periodically run the reindex OLAP job, and you could set up the transaction recovery process as described in https://docs.janusgraph.org/advanced-topics/recovery/#transaction-failure.

For your use case, I would suggest running reindex job after you restore data.

Cheers,
Boxuan


Re: Configured graph factory not working after making changes to gremlin-server.yaml

Sai Supraj R
 

Hi Marc, 

I have mentioned all the properties in the config file, i am not sure why the configurations are not applied when grem server is restarted.


gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[35b35e81-8881-420e-9a6a-092114b96202]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[35b35e81-8881-420e-9a6a-092114b96202] - type ':remote console' to return to local mode
gremlin> map = new HashMap();
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> ConfiguredGraphFactory.open("******")
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N

Thanks
Sai

On Fri, Apr 30, 2021 at 9:30 AM Sai Supraj R via lists.lfaidata.foundation <suprajratakonda=gmail.com@...> wrote:
Hi Marc,

I tried commenting it out and setting it to false but i got the same error message.

gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
Must provide vertex id
Type ':help' or ':h' for help.
Display stack trace? [yN]N

Thanks
Sai

On Fri, Apr 30, 2021 at 2:34 AM <hadoopmarc@...> wrote:
Hi Sai,

I suspect this is related to your setting:

#do not auto generate graph vertex id
graph.set-vertex-id=true

Can you try without?

Best wishes,   Marc


Re: Configured graph factory not working after making changes to gremlin-server.yaml

Sai Supraj R
 

Hi Marc,

I tried commenting it out and setting it to false but i got the same error message.

gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
Must provide vertex id
Type ':help' or ':h' for help.
Display stack trace? [yN]N

Thanks
Sai

On Fri, Apr 30, 2021 at 2:34 AM <hadoopmarc@...> wrote:
Hi Sai,

I suspect this is related to your setting:

#do not auto generate graph vertex id
graph.set-vertex-id=true

Can you try without?

Best wishes,   Marc


Backup & Restore of Janusgraph Data with Mixed Index Backend (Elastisearch)

florian.caesar
 

Hi,

what is the recommended approach for backing up the Janusgraph storage layer (ScyllaDB in my case) together with a mixed index backend (Elasticsearch)?
I know I can back up & restore them separately, but that seems like it might lead to inconsistencies since it's not coordinated.
I would appreciate input from anyone who has ever run Janusgraph in production before - thanks!

Regards,

Florian


Re: Configured graph factory not working after making changes to gremlin-server.yaml

hadoopmarc@...
 

Hi Sai,

I suspect this is related to your setting:

#do not auto generate graph vertex id
graph.set-vertex-id=true

Can you try without?

Best wishes,   Marc


Re: Transaction Cache vs. DB Cache Questions

rngcntr
 

Hi Joe,

just as Boxuan already said, the cache size is crucial for this task. But assuming your graph is large, only a fraction of the vertices will fit into the cache even if scaled appropriately. The problem that I see here is that for large graphs, the chance of finding a vertex in the cache is small, if you iterate over your queries in a random order. If you can come up with an execution order where vertices which have a similar 2-hop neighborhood are processed in temporal proximity to each other, that would greatly improve the cache hit rate.

Best regards,
Florian


Re: Configured graph factory not working after making changes to gremlin-server.yaml

Sai Supraj R
 

Hi, 

This is the gremlin-server.yaml file

# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
        ConfigurationManagementGraph: conf/janusgraph-scylla-configurationgraph.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: []}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536

This is the properties file:

# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# JanusGraph configuration sample: Cassandra over a socket
#
# This file connects to a Cassandra daemon running on localhost via
# Thrift.  Cassandra must already be started before starting JanusGraph
# with this file.

# The implementation of graph factory that will be used by gremlin server
#
# Default:    org.janusgraph.core.JanusGraphFactory
# Data Type:  String
# Mutability: LOCAL
# gremlin.graph=org.janusgraph.core.JanusGraphFactory

gremlin.graph = org.janusgraph.core.ConfiguredGraphFactory
graph.graphname=ConfigurationManagementGraph

# The primary persistence provider used by JanusGraph.  This is required.
# It should be set one of JanusGraph's built-in shorthand names for its
# standard storage backends (shorthands: berkeleyje, cassandrathrift,
# cassandra, astyanax, embeddedcassandra, cql, hbase, inmemory) or to the
# full package and classname of a custom/third-party StoreManager
# implementation.
#
# Default:    (no default value)
# Data Type:  String
# Mutability: LOCAL

storage.backend=cql

# The hostname or comma-separated list of hostnames of storage backend
# servers.  This is only applicable to some storage backends, such as
# cassandra and hbase.
#
# Default:    127.0.0.1
# Data Type:  class java.lang.String[]
# Mutability: LOCAL

storage.hostname=******

# The name of JanusGraph's keyspace.  It will be created if it does not
# exist.
#
# Default:    janusgraph
# Data Type:  String
# Mutability: LOCAL

storage.cql.keyspace=*****

# Whether to enable JanusGraph's database-level cache, which is shared
# across all transactions. Enabling this option speeds up traversals by
# holding hot graph elements in memory, but also increases the likelihood
# of reading stale data.  Disabling it forces each transaction to
# independently fetch graph elements from storage before reading/writing
# them.
#
# Default:    false
# Data Type:  Boolean
# Mutability: MASKABLE

cache.db-cache = true

# How long, in milliseconds, database-level cache will keep entries after
# flushing them.  This option is only useful on distributed storage
# backends that are capable of acknowledging writes without necessarily
# making them immediately visible.
#
# Default:    50
# Data Type:  Integer
# Mutability: GLOBAL_OFFLINE
#
# Settings with mutability GLOBAL_OFFLINE are centrally managed in
# JanusGraph's storage backend.  After starting the database for the first
# time, this file's copy of this setting is ignored.  Use JanusGraph's
# Management System to read or modify this value after bootstrapping.
cache.db-cache-clean-wait = 20

# Default expiration time, in milliseconds, for entries in the
# database-level cache. Entries are evicted when they reach this age even
# if the cache has room to spare. Set to 0 to disable expiration (cache
# entries live forever or until memory pressure triggers eviction when set
# to 0).
#
# Default:    10000
# Data Type:  Long
# Mutability: GLOBAL_OFFLINE
#
# Settings with mutability GLOBAL_OFFLINE are centrally managed in
# JanusGraph's storage backend.  After starting the database for the first
# time, this file's copy of this setting is ignored.  Use JanusGraph's
# Management System to read or modify this value after bootstrapping.
cache.db-cache-time = 180000

# Size of JanusGraph's database level cache.  Values between 0 and 1 are
# interpreted as a percentage of VM heap, while larger values are
# interpreted as an absolute size in bytes.
#
# Default:    0.3
# Data Type:  Double
# Mutability: MASKABLE
cache.db-cache-size = 0.5
storage.cql.write-consistency-level = QUORUM
storage.cql.read-consistency-level = QUORUM
#storage.cql.replication-strategy-class = "NetworkTopologyStrategy"
#storage.cql.replication-strategy-options = "us-east,3"
storage.cql.protocol-version=4
storage.read-time=100000
storage.write-time=100000
#do not auto generate graph vertex id
graph.set-vertex-id=true

When i try to open the graph i am getting this error:
gremlin> ConfiguredGraphFactory.open("ConfigurationManagementGraph")
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N

When trying to create a new graph:
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "cql");
==>null
gremlin> map.put("storage.hostname", "127.0.0.1");
==>null
gremlin> map.put("graph.graphname", "graph1");
==>null
gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
Must provide vertex id
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.IllegalArgumentException: Must provide vertex id

Thanks
Sai



On Thu, Apr 29, 2021 at 1:04 AM Vinayak Bali <vinayakbali16@...> wrote:
Hi,

To investigate the issue please share the recent logs and gremlin-server.yaml and janusgraph.sh which is used to start the service..

Thanks & Regards
Vinayak

On Thu, 29 Apr 2021, 4:13 am Sai Supraj R, <suprajratakonda@...> wrote:
Hi,

But I am not starting a gremlin server with gremlin-server-cql-es.yaml. I am starting with gremlin-server.yaml and I made the changes as suggested in the janusgraph documentation w.r.t configured graph factory.

Thanks
Sai

On Wed, Apr 28, 2021 at 3:21 PM Vinayak Bali <vinayakbali16@...> wrote:
Hi,

Make changes in gremlin-server-cql-es.yaml file. 

Thanks 

On Wed, 28 Apr 2021, 11:52 pm Sai Supraj R, <suprajratakonda@...> wrote:
Hi,

0.5.3

Thanks
Sai

On Wed, Apr 28, 2021 at 2:21 PM Vinayak Bali <vinayakbali16@...> wrote:
Hi,

Which is the janusgraph version being used ???

Regards,
Vinayak

On Wed, 28 Apr 2021, 11:23 pm , <suprajratakonda@...> wrote:
I am trying to use configured graph factory. i made changes to gremlin-server.yaml and configuration-management.properties. I am getting the following error.

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[b1b934d6-3f17-40b6-b6cb-fd735c605c5a] - type ':remote console' to return to local mode
gremlin> ConfiguredGraphFactory.getGraphNames()
gremlin> ConfiguredGraphFactory.open("ConfigurationManagementGraph");
Please create configuration for this graph using the ConfigurationManagementGraph#createConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> ConfiguredGraphFactory.create("ConfigurationManagementGraph");
Please create a template Configuration using the ConfigurationManagementGraph#createTemplateConfiguration API.
Type ':help' or ':h' for help.
Display stack trace? [yN]N


Re: Transaction Cache vs. DB Cache Questions

Boxuan Li
 

Hi Joe,

Vertex properties are indeed cached both in DB cache and transaction cache. If you check out https://docs.janusgraph.org/advanced-topics/data-model/, you will find that the doc says,

JanusGraph stores graphs in adjacency list format which means that a graph is stored as a collection of vertices with their adjacency list. The adjacency list of a vertex contains all of the vertex’s incident edges (and properties).

Thus, I believe the “adjacency lists” wording used in https://docs.janusgraph.org/basics/cache/ actually refers to vertices together with vertex properties (and of course, meta-properties), and edges (and of course, edge properties).

If you refactor your code and use multiple threads sharing a common transaction, then yes, the properties will be stored in transaction cache. That cache is not based on thread-local objects, so using multi-threading does not harm the cache here.

Regarding the performance, you may need to tune your configs, e.g. try increasing cache.db-cache-size, to reduce the chance of frequent cache eviction.

Best regards,
Boxuan

On Apr 29, 2021, at 2:02 PM, hadoopmarc@... wrote:

Hi Joe,

Good question and I do not know the answer. Indeed, the documentation suggests that the DB cache stores less information than the transaction cache, but it is not explicit about vertex properties. It is not explicit about vertex properties in the transaction cache either, but I cannot remember users having problems with missing vertex properties there.

TinkerPop/JanusGraph support multi-threaded transactions. When using these (maybe, you already suggested this in your final line), you are sure that vertices are available from the transaction cache, provided its configs match your traversal.

Best wishes,   Marc


Re: Configured graph factory not working after making changes to gremlin-server.yaml

hadoopmarc@...
 

Hi Sai,

"ConfigurationManagementGraph" is not meant to be opened. Please follow the exact instructions described in:

https://docs.janusgraph.org/basics/configured-graph-factory/#configurationmanagementgraph

Best wishes,    Marc


Re: Transaction Cache vs. DB Cache Questions

hadoopmarc@...
 

Hi Joe,

Good question and I do not know the answer. Indeed, the documentation suggests that the DB cache stores less information than the transaction cache, but it is not explicit about vertex properties. It is not explicit about vertex properties in the transaction cache either, but I cannot remember users having problems with missing vertex properties there.

TinkerPop/JanusGraph support multi-threaded transactions. When using these (maybe, you already suggested this in your final line), you are sure that vertices are available from the transaction cache, provided its configs match your traversal.

Best wishes,   Marc

721 - 740 of 6588