Getting org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException consistently



I am getting the below exception while ingesting data to an existing graph

Job aborted due to stage failure: Task 349 in stage 2.0 failed 10 times, most recent failure: Lost task 349.9 in stage 2.0 (TID 2524, dproc-connect-graph1-prod-us-sw-xwv9.c.zeotap-prod-datalake.internal, executor 262): org.janusgraph.graphdb.database.idassigner.IDPoolExhaustedException: Could not find non-exhausted partition ID Pool after 1000 attempts

The value of `ids.block-size` is set to 5000000 (50M) and I am using spark for data loading (around 300 executors per run).

Could you please suggest the configuration which can fix this issue?




There does not seem to be much that helps in finding a root cause (no similar questions or issues in history). The most helpful thing I found is the following javadoc:

Assuming that you use this default SimpleBulkPlacementStrategy, what value do your use for ids.num-partitions ?  The default number might be too small. In the beginning of a spark job, the tasks can be more or less synchronized, that is they finish after about the same amount of time and then cause congestion (task number 349 ...). If this is the case, other configs could help too:

ids.renew-percentage                             If you increase this value, congestion is avoided a bit, but this cannot have a high impact.
ids.flush                                                  I assume you did not change the default "true" value
ids.authority.conflict-avoidance-mode    Undocumented, but talks about contention during ID block reservation

Best wishes,    Marc