toggle quoted messageShow quoted text
First about the spark RDD, you are absolutely right that RDD.forEachPartition() is the right method to use, my bad. Because it returns void there are no later spark steps that could trigger a second execution. But does that mean that your spark job did not finish succesfully, despite the few transaction failures? I would expect that spark would reschedule the corresponding task until it succeeds. The only problem you can have then is that transactions are not properly closed (the reason for the exception you showed?), so that is why I suggested to catch the exception, rollback the transaction and raise your own exception towards spark.
Your other questions.
1) If you use spark, I would expect that you have a singleton object per spark executor that contains the janusgraph connection and that you manage parallelism on the spark executor with the number of cores per executor. If you use more transactions per spark task/core, you loose the option to rollback the transaction if needed and have spark reschedule the task.
2) It is just something that people sometimes complain about. I guess this should be recognizable from the exceptions raised. Of course it will not hurt to monitor CPU and ram usage of your elasticsearch instances. It will only happen if the elastic cluster is the weakest link in the chain, that is if janusgraph and HBase can process more transactions than elastic can handle.
Last remark, it is not unusual that a few spark tasks fail, it is just something that happens for all kinds of reasons in complex distributed setups. Your application must simple be able to handle these failures and reschedule the task.
Best wishes, Marc
Op vrijdag 25 september 2020 om 23:29:16 UTC+2 schreef nar...@...:
i think i got answer from you. it might be because of too many transactions or
indexing backend that cannot keep up with the ingestion. but i have few questions on this.
am using janusgraph client 0.3.2 with Hbase(8 region servers), elastic
1)what is the suggestible number of transactions per janugraph instance? and i hope should be able to replicate it by creating too many transactions or any other best way to replicate and test ?
an indexing backend that cannot keep up with the ingestion -- any idea which case it will happen? please suggest any best way to replicate and test ?
and thanks for suggestions on spark
Yes we have enough partitions with each 500 vertices max.
not using exactly RDD.mapPartions(), but using
() and vertices will be created in spark action/operation i.e stream.forEachRDD -> forEachPartition (.. creating vertices here...). please suggest if this is not the right way?
On Friday, September 25, 2020 at 11:39:45 PM UTC+8 HadoopMarc wrote:
It is the responsibility of the application to commit transactions. One application example is gremlin-server which can do that for you, but this may be not be the most convenient for bulk loading.
If you use spark, a nice way is to use the RDD.mapPartions() function. If you have partitions of the size of a single transaction (1000-10000 vertices), you can catch any exceptions and rollback the transaction on failure and commit on success. Spark will automatically retry a failed partition and by using mapPartitions() you are sure that there is exactly one succesful run for any partition.
Reasons for occasional failure may be too large transactions or an indexing backend that cannot keep up with the ingestion. ID block exhaustion generates its own exceptions.
Op vrijdag 25 september 2020 om 14:52:34 UTC+2 schreef nar...@...
am using spark for parallel processing with mix of batch loading(at transaction level) and normal transaction.
some cases am using bulk loading at transaction level
txn = janusGraph.buildTransaction().enableBatchLoading().start();
create vertices and edges
case 2# with normal transaction
txn = janusGraph.newTransaction();
create vertices and edges
got below exception in the middle of processing and transaction did not commit hence failed to create vertices.
java.lang.IllegalStateException: Cannot access element because its enclosing transaction is closed and unbound
it happens very rare and not sure which case it will happen
can you please suggest, is there any case where janusgraph can commit/close transaction automatically?
we are explicitly opening, commiting and closing txns, so no the other place where we can close/commit in the middle of processing.