Re: Timed out waiting for next row data


toom <to...@...>
 

      Hello,

I’ve identified the probable source of my problem. On the Cassandra database, I found a huge number of rows with the same key. The query “SELECT * FROM edgestore WHERE key=0x880000000F15B700” returns 3,275,110 rows. The related vertex (8098603144) has few attributes but has 3M+ edges.

Iterating these rows takes several minutes by Cassandra. During the scan job, no new entries is sent to the executor while all rows of this vertex have been exhausted [1] and then a timeout is raised [2]. 

I'm not sure that I can run a reindex job after removing these edges. Moreover I can't ensure that this edge case will not happen again.

What is the best options ? Prevent huge number of edges (what is the max ?) for a vertex or increase timeout (currently the timeout [3] doesn't seem to be configurable) ?

It is probably difficult to implement but I think that timeout should occurs only when there is no new Cassandra row.


Regards,

Toom.


[1] https://github.com/JanusGraph/janusgraph/blob/v0.5.2/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/CQLResultSetKeyIterator.java#L69

[2] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L154

[3] https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/scan/StandardScannerExecutor.java#L45


On Wednesday, September 9, 2020 at 10:50:08 AM UTC+2 toom wrote:
Hi,

I've a JanusGraph database with Cassandra backend without indexing backend (currently I only use composite indexes).
I batch-loaded data (16M+ of vertex). Then I installed composite indexes on vertices but during the reindexing, I got errors:

[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-7 [|] Exception occurred during job execution: {}
org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:155)
        at java.lang.Thread.run(Thread.java:748)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-11 [|] Processing thread interrupted while waiting on queue or processing data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:282)
[ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-9 [|] Data-pulling thread interrupted while waiting on queue or data
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:340)

It seems that the process always stops at the same row (scanMetrics.getCustom(IndexRepairJob.ADDED_RECORDS_COUNT) gives the same value when the error occurs).
I suspect a ghost vertex problem [1] but I don't know how to fix it. I tried to use GostVertexRemover [2] but I got the same error (Timed out waiting for next row data).
My Cassandra looks healthy (no error in log) and the query g.V().count() works.

What can I do ?

Toom.

Join janusgraph-users@lists.lfaidata.foundation to automatically receive all group messages.