Timed out waiting for next row data


HadoopMarc <bi...@...>
 

Hi Toom,

Thanks for reporting back. This sounds like a JanusGraph issue to me, because it cannot handle this almost standard situation.

Removing the vertex or its edges and possibly the corrupted index + reindexing should work (but I have no experience with this myself). To prevent future cases, you would have to filter supernodes before ingestion into janusgraph (often, supernodes are not really meaningful and can be remodelled).

Other ideas:
  • You could try and repeat the batch-loading from the start with the index predefined. Possibly, this would spread the load on indexing the 3M edges more evenly than during reindexing (unless the 3M edges appear in the same loading transaction).
  • You could try whether the indexing job is affected by the query.* config properties. It is not clear to me why retrieving 3M rows from Cassandra would take minutes, unless they are retrieved via individual calls over the network.
Best wishes,  Marc


Op donderdag 24 september 2020 om 15:22:29 UTC+2 schreef spars...@...:

Toom,
Have explored index backend option(els) ?Also can you share your gremlin query. It should be go out from a vertex.

Thanks,
Sparshneel

On Thu, Sep 24, 2020, 6:32 PM toom <t...@...> wrote:

      Hello,

I’ve identified the probable source of my problem. On the Cassandra database, I found a huge number of rows with the same key. The query “SELECT * FROM edgestore WHERE key=0x880000000F15B700” returns 3,275,110 rows. The related vertex (8098603144) has few attributes but has 3M+ edges.

Iterating these rows takes several minutes by Cassandra. During the scan job, no new entries is sent to the executor while all rows of this vertex have been exhausted [1] and then a timeout is raised [2]. 

I'm not sure that I can run a reindex job after removing these edges. Moreover I can't ensure that this edge case will not happen again.

What is the best options ? Prevent huge number of edges (what is the max ?) for a vertex or increase timeout (currently the timeout [3] doesn't seem to be configurable) ?

It is probably difficult to implement but I think that timeout should occurs only when there is no new Cassandra row.


Regards,

Toom.


[1] https://github.com/JanusGraph/janusgraph/blob/v0.5.2/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/CQLResultSetKeyIterator.java#L69

[2] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L154

[3] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L45


On Wednesday, September 9, 2020 at 10:50:08 AM UTC+2 toom wrote:
Hi,

I've a JanusGraph database with Cassandra backend without indexing backend (currently I only use composite indexes).
I batch-loaded data (16M+ of vertex). Then I installed composite indexes on vertices but during the reindexing, I got errors:

[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-7 [|] Exception occurred during job execution: {}
org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:155)
        at java.lang.Thread.run(Thread.java:748)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-11 [|] Processing thread interrupted while waiting on queue or processing data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:282)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-9 [|] Data-pulling thread interrupted while waiting on queue or data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:340)

It seems that the process always stops at the same row (scanMetrics.getCustom(IndexRepairJob.ADDED_RECORDS_COUNT) gives the same value when the error occurs).
I suspect a ghost vertex problem [1] but I don't know how to fix it. I tried to use GostVertexRemover [2] but I got the same error (Timed out waiting for next row data).
My Cassandra looks healthy (no error in log) and the query g.V().count() works.

What can I do ?

Toom.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/af01a4ba-4c57-4f50-81ce-c24a2cb38f70n%40googlegroups.com.


sparshneel chanchlani <sparshneel...@...>
 

Toom,
Have explored index backend option(els) ?Also can you share your gremlin query. It should be go out from a vertex.

Thanks,
Sparshneel

On Thu, Sep 24, 2020, 6:32 PM toom <to...@...> wrote:

      Hello,

I’ve identified the probable source of my problem. On the Cassandra database, I found a huge number of rows with the same key. The query “SELECT * FROM edgestore WHERE key=0x880000000F15B700” returns 3,275,110 rows. The related vertex (8098603144) has few attributes but has 3M+ edges.

Iterating these rows takes several minutes by Cassandra. During the scan job, no new entries is sent to the executor while all rows of this vertex have been exhausted [1] and then a timeout is raised [2]. 

I'm not sure that I can run a reindex job after removing these edges. Moreover I can't ensure that this edge case will not happen again.

What is the best options ? Prevent huge number of edges (what is the max ?) for a vertex or increase timeout (currently the timeout [3] doesn't seem to be configurable) ?

It is probably difficult to implement but I think that timeout should occurs only when there is no new Cassandra row.


Regards,

Toom.


[1] https://github.com/JanusGraph/janusgraph/blob/v0.5.2/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/CQLResultSetKeyIterator.java#L69

[2] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L154

[3] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L45


On Wednesday, September 9, 2020 at 10:50:08 AM UTC+2 toom wrote:
Hi,

I've a JanusGraph database with Cassandra backend without indexing backend (currently I only use composite indexes).
I batch-loaded data (16M+ of vertex). Then I installed composite indexes on vertices but during the reindexing, I got errors:

[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-7 [|] Exception occurred during job execution: {}
org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:155)
        at java.lang.Thread.run(Thread.java:748)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-11 [|] Processing thread interrupted while waiting on queue or processing data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:282)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-9 [|] Data-pulling thread interrupted while waiting on queue or data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:340)

It seems that the process always stops at the same row (scanMetrics.getCustom(IndexRepairJob.ADDED_RECORDS_COUNT) gives the same value when the error occurs).
I suspect a ghost vertex problem [1] but I don't know how to fix it. I tried to use GostVertexRemover [2] but I got the same error (Timed out waiting for next row data).
My Cassandra looks healthy (no error in log) and the query g.V().count() works.

What can I do ?

Toom.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/af01a4ba-4c57-4f50-81ce-c24a2cb38f70n%40googlegroups.com.


toom <to...@...>
 

      Hello,

I’ve identified the probable source of my problem. On the Cassandra database, I found a huge number of rows with the same key. The query “SELECT * FROM edgestore WHERE key=0x880000000F15B700” returns 3,275,110 rows. The related vertex (8098603144) has few attributes but has 3M+ edges.

Iterating these rows takes several minutes by Cassandra. During the scan job, no new entries is sent to the executor while all rows of this vertex have been exhausted [1] and then a timeout is raised [2]. 

I'm not sure that I can run a reindex job after removing these edges. Moreover I can't ensure that this edge case will not happen again.

What is the best options ? Prevent huge number of edges (what is the max ?) for a vertex or increase timeout (currently the timeout [3] doesn't seem to be configurable) ?

It is probably difficult to implement but I think that timeout should occurs only when there is no new Cassandra row.


Regards,

Toom.


[1] https://github.com/JanusGraph/janusgraph/blob/v0.5.2/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/CQLResultSetKeyIterator.java#L69

[2] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L154

[3] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L45


On Wednesday, September 9, 2020 at 10:50:08 AM UTC+2 toom wrote:
Hi,

I've a JanusGraph database with Cassandra backend without indexing backend (currently I only use composite indexes).
I batch-loaded data (16M+ of vertex). Then I installed composite indexes on vertices but during the reindexing, I got errors:

[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-7 [|] Exception occurred during job execution: {}
org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:155)
        at java.lang.Thread.run(Thread.java:748)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-11 [|] Processing thread interrupted while waiting on queue or processing data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:282)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-9 [|] Data-pulling thread interrupted while waiting on queue or data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:340)

It seems that the process always stops at the same row (scanMetrics.getCustom(IndexRepairJob.ADDED_RECORDS_COUNT) gives the same value when the error occurs).
I suspect a ghost vertex problem [1] but I don't know how to fix it. I tried to use GostVertexRemover [2] but I got the same error (Timed out waiting for next row data).
My Cassandra looks healthy (no error in log) and the query g.V().count() works.

What can I do ?

Toom.


toom <to...@...>
 

Hi,

I've a JanusGraph database with Cassandra backend without indexing backend (currently I only use composite indexes).
I batch-loaded data (16M+ of vertex). Then I installed composite indexes on vertices but during the reindexing, I got errors:

[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-7 [|] Exception occurred during job execution: {}
org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:155)
        at java.lang.Thread.run(Thread.java:748)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-11 [|] Processing thread interrupted while waiting on queue or processing data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:282)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-9 [|] Data-pulling thread interrupted while waiting on queue or data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:340)

It seems that the process always stops at the same row (scanMetrics.getCustom(IndexRepairJob.ADDED_RECORDS_COUNT) gives the same value when the error occurs).
I suspect a ghost vertex problem [1] but I don't know how to fix it. I tried to use GostVertexRemover [2] but I got the same error (Timed out waiting for next row data).
My Cassandra looks healthy (no error in log) and the query g.V().count() works.

What can I do ?

Toom.

[1] https://github.com/JanusGraph/janusgraph/issues/1750
[2] https://github.com/JanusGraph/janusgraph/blob/v0.5.2/janusgraph-core/src/main/java/org/janusgraph/graphdb/olap/job/GhostVertexRemover.java