Date   

Re: reindex job is very slow on ElasticSearch and BigTable

hadoopmarc@...
 

OK, thanks for confirming the stacktrace. You can report this behavior as an issue on https://github.com/JanusGraph/janusgraph/issues, referring to this thread. It is still not clear to me how this exception can occur, because the BigTable compatibility layer reuses the HBase backend, for which graph.getBackend().getStoreManager().getHadoopManager(); is available.

So, I am afraid there is no quick fix for your issue, unless you start debugging MapReduceIndexManagement for BigTable yourself. Maybe, simply reloading the graph is an option.


Best wishes,    Marc


Re: Connecting to Multiple Schemas using Java

hadoopmarc@...
 

Hi Vinayak,

The TinkerPop ref docs gives the following code fragment for connecting with the cluster method:

Cluster cluster = Cluster.open();
GraphTraversalSource g1 = traversal().withRemote(DriverRemoteConnection.using(cluster, "g1"));
GraphTraversalSource g2 = traversal().withRemote(DriverRemoteConnection.using(cluster, "g2"));
Is this what you tried (I do not see it in your question)?

Best wishes,    Marc


Re: Issues with controlling partitions when using Apache Spark

Mladen Marović
 

Thanks for the responses.

I'll create a github issue for this then, and also create a PR with the changes that fixed this issue for me, in case anyone finds it useful.

I'm also interested in doing the spark-cassandra-connector implementation, however it might take a while until I get around to it.


Re: Problem with index never becoming ENABLED.

vamsi.lingala@...
 

Register index before you enable it


Re: Janusgraph 0.5.3 potential memory leak

Oleksandr Porunov
 

What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?
I mean that possibly we need to change the logic back to use `StaticArrayEntryList.of(Iterable<E> ... ...)` instead of `StaticArrayEntryList.of(Iterator<E> ... ...)`. If so, we may need to use `Lazy.of` again but then we need to think about what exactly it returns (previously it used to return an ArrayList but that's again additional computation which would be better to avoid).
We may also think about improving `StaticArrayEntryList.of(Iterator<E> ... ...)` to not cause memory problems but I didn't look deep into the logic yet.
The first thing which I'm thinking about, maybe we could change
`StaticArrayEntryList.of(Iterator<E> ... ...)` to have the same logic as
`StaticArrayEntryList.of(Iterable<E> ... ...)`. Of course, we can't use that iterator 2 times, but we could store intermediate elements inside some Singly Linked List. I guess something like:
class SinglyLinkedList<E> {
  E value;
 
SinglyLinkedList<E> nextElement;
}
That said, I didn't compare space and time complexity of

`StaticArrayEntryList.of(Iterable<E> ... ...)` vs
`StaticArrayEntryList.of(Iterator<E> ... ...)`.


Connecting to Multiple Schemas using Java

Vinayak Bali
 

Hi,
I am trying to connect to multiple schema's through java using the Cluster method. The properties files are as follows:

gremlin-server.yaml
# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  graph1: conf/graph1.properties,
  graph2: conf/graph2.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: []}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536

graph1.properties

# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# JanusGraph configuration sample: Cassandra & Elasticsearch over sockets
#
# This file connects to Cassandra and Elasticsearch services running
# on localhost over the CQL API and the Elasticsearch native
# "Transport" API on their respective default ports.  The Cassandra
# and Elasticsearch services must already be running before starting
# JanusGraph with this file.

# The implementation of graph factory that will be used by gremlin server
#
# Default:    org.janusgraph.core.JanusGraphFactory
# Data Type:  String
# Mutability: LOCAL
gremlin.graph=org.janusgraph.core.JanusGraphFactory

# The primary persistence provider used by JanusGraph.  This is required.
# It should be set one of JanusGraph's built-in shorthand names for its
# standard storage backends (shorthands: berkeleyje, cassandrathrift,
# cassandra, astyanax, embeddedcassandra, cql, hbase, inmemory) or to the
# full package and classname of a custom/third-party StoreManager
# implementation.
#
# Default:    (no default value)
# Data Type:  String
# Mutability: LOCAL
storage.backend=cql

# The hostname or comma-separated list of hostnames of storage backend
# servers.  This is only applicable to some storage backends, such as
# cassandra and hbase.
#
# Default:    127.0.0.1
# Data Type:  class java.lang.String[]
# Mutability: LOCAL
storage.hostname=127.0.0.1

# The name of JanusGraph's keyspace.  It will be created if it does not
# exist.
#
# Default:    janusgraph
# Data Type:  String
# Mutability: LOCAL
storage.cql.keyspace=graph1

# Whether to enable JanusGraph's database-level cache, which is shared
# across all transactions. Enabling this option speeds up traversals by
# holding hot graph elements in memory, but also increases the likelihood
# of reading stale data.  Disabling it forces each transaction to
# independently fetch graph elements from storage before reading/writing
# them.
#
# Default:    false
# Data Type:  Boolean
# Mutability: MASKABLE
cache.db-cache = true

# How long, in milliseconds, database-level cache will keep entries after
# flushing them.  This option is only useful on distributed storage
# backends that are capable of acknowledging writes without necessarily
# making them immediately visible.
#
# Default:    50
# Data Type:  Integer
# Mutability: GLOBAL_OFFLINE
#
# Settings with mutability GLOBAL_OFFLINE are centrally managed in
# JanusGraph's storage backend.  After starting the database for the first
# time, this file's copy of this setting is ignored.  Use JanusGraph's
# Management System to read or modify this value after bootstrapping.
cache.db-cache-clean-wait = 20

# Default expiration time, in milliseconds, for entries in the
# database-level cache. Entries are evicted when they reach this age even
# if the cache has room to spare. Set to 0 to disable expiration (cache
# entries live forever or until memory pressure triggers eviction when set
# to 0).
#
# Default:    10000
# Data Type:  Long
# Settings with mutability GLOBAL_OFFLINE are centrally managed in
# JanusGraph's storage backend.  After starting the database for the first
# time, this file's copy of this setting is ignored.  Use JanusGraph's
# Management System to read or modify this value after bootstrapping.
cache.db-cache-time = 180000

# Size of JanusGraph's database level cache.  Values between 0 and 1 are
# interpreted as a percentage of VM heap, while larger values are
# interpreted as an absolute size in bytes.
#
# Default:    0.3
# Data Type:  Double
# Mutability: MASKABLE
cache.db-cache-size = 0.25

# Connect to an already-running ES instance on localhost

# The indexing backend used to extend and optimize JanusGraph's query
# functionality. This setting is optional.  JanusGraph can use multiple
# heterogeneous index backends.  Hence, this option can appear more than
# once, so long as the user-defined name between "index" and "backend" is
# unique among appearances.Similar to the storage backend, this should be
# set to one of JanusGraph's built-in shorthand names for its standard
# index backends (shorthands: lucene, elasticsearch, es, solr) or to the
# full package and classname of a custom/third-party IndexProvider
# implementation.
#
# Default:    elasticsearch
# Data Type:  String
# Mutability: GLOBAL_OFFLINE
#
# Settings with mutability GLOBAL_OFFLINE are centrally managed in
# JanusGraph's storage backend.  After starting the database for the first
# time, this file's copy of this setting is ignored.  Use JanusGraph's
# Management System to read or modify this value after bootstrapping.
index.search.backend=elasticsearch

# The hostname or comma-separated list of hostnames of index backend
# servers.  This is only applicable to some index backends, such as
# elasticsearch and solr.
#
# Default:    127.0.0.1
# Data Type:  class java.lang.String[]
# Mutability: MASKABLE
index.search.hostname=127.0.0.1

graph2.properties is the same as graph1.properties only change is the schema name.

empty-sample.groovy

// Copyright 2019 JanusGraph Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]

// defines a sample LifeCycleHook that prints some output to the Gremlin Server console.
// note that the name of the key in the "global" map is unimportant.
globals << [hook : [
        onStartUp: { ctx ->
            ctx.logger.info("Executed once at startup of Gremlin Server.")
        },
        onShutDown: { ctx ->
            ctx.logger.info("Executed once at shutdown of Gremlin Server.")
        }
] as LifeCycleHook]

// define the default TraversalSource to bind queries to - this one will be named "g".
graph1=JanusGraphFactory.open('conf/graph1.properties')
graph2=JanusGraphFactory.open('conf/graph2.properties')
globals << [ g1 : graph1.traversal() , g2 : graph2.traversal()]

When I run the gremlin query with g1 or g2, getting the error g1/g2 not defined.
But if we use g it is using graph2 and return the result.
How can we connect to different schema's using traversals?

Thanks & Regards,
Vinayak


Re: Janusgraph 0.5.3 potential memory leak

rngcntr
 

What @porunov mentions, looks quite interesting. When I made the change in the code, I didn't actually notice I changed the signature that is used for `ofStaticBuffer`. But as you mentioned, it now looks like the reason to use `Lazy.of` is gone in the newer version using the `Iterator` and thus, looping twice can not be the issue.

I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.
What exactly do you mean by that? Do you mean to change the implementation of `ofStaticBuffer`?

One thing that I've found is that `StaticArrayEntryList.of(Iterator<E> ... ...)` repeatedly calls a self-implemented method called `ensureSpace` which allocates a new array twice as large as the old one and copies the entries over to the new one. Although the JVM should GC the old (and unused) array, this behavior seems to me like it is prone to cause memory leaks if the unused arrays are not dropped correctly. This method is not used in the `StaticArrayEntryList.of(Iterable<E> ... ...)` implementation.


Re: Transaction Management

rngcntr
 

We've had another post about memory leaks just the other day, here: https://lists.lfaidata.foundation/g/janusgraph-users/message/5544
Do you think what you encountered is a duplicate of that problem or is it something different?


Re: reindex job is very slow on ElasticSearch and BigTable

vamsi@...
 

Got the same error 

throw new IllegalArgumentException("Store manager class " + graph.getBackend().getStoreManagerClass() + "is not supported");


Re: Janusgraph 0.5.3 potential memory leak

Oleksandr Porunov
 

One more thing I noticed is that previously we were passing `Iterable` and right now we are passing `Iterator` `ofStaticBuffer` and those methods are computed differently actually.
Here are the first and the second method:
private static <E,D> EntryList of(Iterable<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler)
private static <E,D> EntryList of(Iterator<E> elements, StaticArrayEntry.GetColVal<E,D> getter, StaticArrayEntry.DataHandler<D> dataHandler)

If we check the code, their implementation is slightly different. The first methods passes 2 times `elements` and computes something whereas a second method passes `elements` once.

I do understand now why we used lazy `Lazy.of` previously. It's just because we were looping `elements` 2 times instead of once.
I guess, the main problem in the previous model was that we were adding all elements into an ArrayList inside `Lazy.of` code. I think, we can change the old solution to return an Iterable as well but don't call `iterator` for resultSet 2 times.
That said, it's just some quick observations. I didn't go deep into the logic


Transaction Management

ryssavage@...
 

Hello. I have been having memory leaks while using Janusgraph and found that there were a few places that I was not explicitly closing transactions and thought that might be the culprit. I am now closing all transactions explicitly but still find that I get out of memory errors. Someone suggested to me to go look at my cassandra database txlog table to see if there are any stray transactions there. I see 14 rows in that table. Before I go and mess with that table I would like to understand what it is for and what the rows inside it indicate. Are they always indicative of stale transactions?

I have done some local tests and did a whole bunch of queries where I closed the transactions and never saw anything pop up in this table. then I did a whole bunch of queries where I didn't close the transaction and one row appeared in this table. 

Can someone please explain what is going on here?


Re: Janusgraph 0.5.3 potential memory leak

sergeymetallic@...
 

I did not figure out the reason of the problem, but what is interesting - CPU does not recover even after an hour or two and the process continues reading from Scylla with pretty high speed. Looks like the reading process was not interrupted properly. But the reason of the issue is not obvious and requires, maybe, deeper profiling.


Re: Janusgraph 0.5.3 potential memory leak

Oleksandr Porunov
 

Thank you for reporting this bug!

That's interesting. The one difference I see is that now the code performs `rs.iterator()` immediately (and not lazily as it was previously). That said, I didn't check if that's the root cause of the problem or not.
Probably `rs.iterator()` may cause some issues with memory management in that place (line 328) in the PR but it should be verified. I guess, we need to check if `rs.iterator()` adds any memory pressure during the iterator construction.
My point is that `Lazy.of` (which was removed in the PR) memorizes the computation. Thus, repeated calls to `lazyList.get()` will always return the same object. Whereas repeated calls of `rs.iterator()` creates new different iterators.
That said, it's just a spontaneous guess and the problem might be with something else.


Recommended way to perform Schema / Data migration

nick.ood17@...
 

Hello,

I would like to ask the following:

Is there any recommended way to perform Schema/Data migration?

Thanks in advance,
Nick.


Re: Janusgraph 0.5.3 potential memory leak

rngcntr
 

Hi there!

What you describe looks very interesting and is definitely not intended behavior. You mentioned my PR which seems to cause these troubles. That's quite interesting because this PR was actually merged to *improve* memory handling, not *worsen* it :P

Since the PR is rather small and you have probably already had a look at the changes it made: Did you find anything that looks suspicious right away? I would be happy to find this bug and fix it and it would be great if you share everything you already found out.


Re: Issues with controlling partitions when using Apache Spark

hadoopmarc@...
 

Hi Mladen,

Having answered several questions about the JanusGraph InputFormats, I can confirm that many users encounter problems with the size of the input splits. This is the case in particular for the HBaseInputFormat where input splits are equal to HBase regions and HBase requires regions to have a size of the order of 10GB (compressed binary data!). Users could only work around this by manually and temporarily splitting the HBase regions. For the CassandraInputFormat problems surface less often, because there a default number of about 500 partitions is used, so you need a lot of data before partition size becomes a limitation.

So, I also encourage you to contribute, if possible!

Also note that there is a fundamental problem to OLAP in graphs: traversing a graph implies shuffling between partitions and this is only efficient if the entire graph fits in the cluster memory. So, where the scalability of JanusGraph OLTP queries is limited by disk space and the performance of the indexing backend, the scalability of OLAP queries is limited by cluster memory.

Best wishes,    Marc


Re: Janusgraph 0.5.3 potential memory leak

sergeymetallic@...
 

After some research I figured out that rolling back this PR https://github.com/JanusGraph/janusgraph/pull/2080/files# helps 


Janusgraph 0.5.3 potential memory leak

sergeymetallic@...
 

JG 0.5.3(same on 0.5.2), cannot be reproduced on JG 0.3.2
Backend: scyllaDB
Indexing backend: ElasticSearch

Steps to reproduce: 
1) Create a node with a composite index for the field "X"
2) Create another kind (Y) of node and fill with a lot of data (several millions nodes)
3) Create edges between node X and all the nodes Y with the label L
4) Execute the following query in gremlin: g.V().has("X","value").out("Y").valueMap(true)
5) Query should time out after some time

The main idea - only scylla/cassandra is involved in the query

Expected result: Janusgraph operates normally

Observed result: Janusgraph starts consuming all the allocated memory and one of the CPU cores is loaded 100%, another execution will load another core and so on until there are no available. CPU load and CPU consumption happens even if there is no any further interaction with the system. In the end JG becomes unresponsive.

Flame chart looks like this
 


Re: Issues with controlling partitions when using Apache Spark

Evgenii Ignatev
 

Hello Mladen,

Yes, we have experienced this issue as well, although we weren't able to fix it.

You solution sounds very interesting, could you share your enhacement as a PR (even not finished one)?
We have done some analysis of source code back then, I might be able to help with PR/tests - feel free to contact me.

Best regards,
Evgenii Ignatev.

On 26.01.2021 16:34, Florian Hockmann wrote:

Hi Mladen,

 

I wasn’t aware that the CqlInputFormat we’re using is considered legacy. Looks then like we should migrate to spark-cassandra-connector. Could you please create an issue on GitHub for this?

And if you already have an implementation ready for this, then it would of course be really great if you could contribute it with a PR.

 

Regards,

Florian

 

Von: janusgraph-users@... <janusgraph-users@...> Im Auftrag von Mladen Marovic
Gesendet: Montag, 25. Januar 2021 17:34
An: janusgraph-users@...
Betreff: [janusgraph-users] Issues with controlling partitions when using Apache Spark

 

Hey there!

 

I've recently been working on some Apache Spark jobs for Janusgraph via hadoop-gremlin (as described on https://docs.janusgraph.org/advanced-topics/hadoop/) and encountered several issues. Generally, I kept having memory issues as the partitions were too big to be loaded into my spark executors (which I increased up to 16GB per executor).

 

After analysing the code, I found two parameters that could be used to further subsplit the partitions: cassandra.input.split.size and cassandra.input.split.size_mb. However, when trying to use these parameters, and debugging when the memory issues persisted, I noticed several bugs in the underlying org.apache.cassandra.hadoop.cql3.CqlInputFormat used to load the data. I posted the question on the datastax community forums (see https://community.datastax.com/questions/10153/how-to-control-partition-size-when-reading-data-wi.html). There I was ultimately suggested to migrate to the spark-cassandra-connector because the issues I encountered were probably bugs, but that was legacy code (and probably not maintained anymore).

 

In the meantime, I reimplemented the InputFormat classes in my app to fix the issues, and testing so far showed that this now works as intended. However, I was wondering the following:

 

1. Does anyone else have any experience with using Apache Spark, Janusgraph, and graphs too big to fit into memory without subsplitting? Did you also encounter this issue? If so, how did you deal with it?

2. Is there an "official" solution to this issue?

3. Are there any plans to migrate to the spark-cassandra connector for this use case?

 

Thanks,

 

Mladen


Re: Issues with controlling partitions when using Apache Spark

Florian Hockmann
 

Hi Mladen,

 

I wasn’t aware that the CqlInputFormat we’re using is considered legacy. Looks then like we should migrate to spark-cassandra-connector. Could you please create an issue on GitHub for this?

And if you already have an implementation ready for this, then it would of course be really great if you could contribute it with a PR.

 

Regards,

Florian

 

Von: janusgraph-users@... <janusgraph-users@...> Im Auftrag von Mladen Marovic
Gesendet: Montag, 25. Januar 2021 17:34
An: janusgraph-users@...
Betreff: [janusgraph-users] Issues with controlling partitions when using Apache Spark

 

Hey there!

 

I've recently been working on some Apache Spark jobs for Janusgraph via hadoop-gremlin (as described on https://docs.janusgraph.org/advanced-topics/hadoop/) and encountered several issues. Generally, I kept having memory issues as the partitions were too big to be loaded into my spark executors (which I increased up to 16GB per executor).

 

After analysing the code, I found two parameters that could be used to further subsplit the partitions: cassandra.input.split.size and cassandra.input.split.size_mb. However, when trying to use these parameters, and debugging when the memory issues persisted, I noticed several bugs in the underlying org.apache.cassandra.hadoop.cql3.CqlInputFormat used to load the data. I posted the question on the datastax community forums (see https://community.datastax.com/questions/10153/how-to-control-partition-size-when-reading-data-wi.html). There I was ultimately suggested to migrate to the spark-cassandra-connector because the issues I encountered were probably bugs, but that was legacy code (and probably not maintained anymore).

 

In the meantime, I reimplemented the InputFormat classes in my app to fix the issues, and testing so far showed that this now works as intended. However, I was wondering the following:

 

1. Does anyone else have any experience with using Apache Spark, Janusgraph, and graphs too big to fit into memory without subsplitting? Did you also encounter this issue? If so, how did you deal with it?

2. Is there an "official" solution to this issue?

3. Are there any plans to migrate to the spark-cassandra connector for this use case?

 

Thanks,

 

Mladen

1101 - 1120 of 6661