toggle quoted messageShow quoted text
As said, for finding a root cause first be sure that failing transactions are rolled back (you did not confirm that you took care of that).
1) I would expect that you use just 4 or 8 cores per executor, so this number of parallel transactions is insignificant to janusgraph. If there would be a problem, spark would simply raise an OOM exception. Also note that janusgraph instances in different executors are mostly independent, apart from the id-manager and their load on the backends.
2) Good question on read/write load in elastic. My mind was on the writing, because of the processing related to indexing, but during ingestion of edges there may also a be lot of lookups for the vertices that are not present in application memory or in the janusgraph cache. So, I guess both reads and writes matter.
Best wishes, Marc
Op zaterdag 26 september 2020 om 14:44:23 UTC+2 schreef nar...@...:
Hi Marc, thanks for replying so quickly
I agree with your remark, spark task is able to retry Hance it can be handled easily.
Am mostly focusing on root cause,so that we can fix actual problem
My Comments on same questions
1) Yes, jausgraph instance is singleton. And we create one transaction per task.
Since "too many transactions" been raised,am just checking will it be any issue if we have many tasks/cores (indirectly too many transactions per single jausgraph instance at same time) and any limit on this?
2) Yes, good to monitor elastic.
Will it happen only for writes or it can happen for reads also?
Bad thing is, no other errors printed apart from illegalstateexception and not able to replicate to see exactly what's happening
On Sat, 26 Sep 2020, 7:47 pm HadoopMarc, <b...@...
First about the spark RDD, you are absolutely right that RDD.forEachPartition() is the right method to use, my bad. Because it returns void there are no later spark steps that could trigger a second execution. But does that mean that your spark job did not finish succesfully, despite the few transaction failures? I would expect that spark would reschedule the corresponding task until it succeeds. The only problem you can have then is that transactions are not properly closed (the reason for the exception you showed?), so that is why I suggested to catch the exception, rollback the transaction and raise your own exception towards spark.
Your other questions.
1) If you use spark, I would expect that you have a singleton object per spark executor that contains the janusgraph connection and that you manage parallelism on the spark executor with the number of cores per executor. If you use more transactions per spark task/core, you loose the option to rollback the transaction if needed and have spark reschedule the task.
2) It is just something that people sometimes complain about. I guess this should be recognizable from the exceptions raised. Of course it will not hurt to monitor CPU and ram usage of your elasticsearch instances. It will only happen if the elastic cluster is the weakest link in the chain, that is if janusgraph and HBase can process more transactions than elastic can handle.
Last remark, it is not unusual that a few spark tasks fail, it is just something that happens for all kinds of reasons in complex distributed setups. Your application must simple be able to handle these failures and reschedule the task.
Best wishes, Marc
Op vrijdag 25 september 2020 om 23:29:16 UTC+2 schreef nar...@...
i think i got answer from you. it might be because of too many transactions or
indexing backend that cannot keep up with the ingestion. but i have few questions on this.
am using janusgraph client 0.3.2 with Hbase(8 region servers), elastic
1)what is the suggestible number of transactions per janugraph instance? and i hope should be able to replicate it by creating too many transactions or any other best way to replicate and test ?
an indexing backend that cannot keep up with the ingestion -- any idea which case it will happen? please suggest any best way to replicate and test ?
and thanks for suggestions on spark
Yes we have enough partitions with each 500 vertices max.
not using exactly RDD.mapPartions(), but using
() and vertices will be created in spark action/operation i.e stream.forEachRDD -> forEachPartition (.. creating vertices here...). please suggest if this is not the right way?
On Friday, September 25, 2020 at 11:39:45 PM UTC+8 HadoopMarc wrote:
It is the responsibility of the application to commit transactions. One application example is gremlin-server which can do that for you, but this may be not be the most convenient for bulk loading.
If you use spark, a nice way is to use the RDD.mapPartions() function. If you have partitions of the size of a single transaction (1000-10000 vertices), you can catch any exceptions and rollback the transaction on failure and commit on success. Spark will automatically retry a failed partition and by using mapPartitions() you are sure that there is exactly one succesful run for any partition.
Reasons for occasional failure may be too large transactions or an indexing backend that cannot keep up with the ingestion. ID block exhaustion generates its own exceptions.
Op vrijdag 25 september 2020 om 14:52:34 UTC+2 schreef nar...@...:
am using spark for parallel processing with mix of batch loading(at transaction level) and normal transaction.
some cases am using bulk loading at transaction level
txn = janusGraph.buildTransaction().enableBatchLoading().start();
create vertices and edges
case 2# with normal transaction
txn = janusGraph.newTransaction();
create vertices and edges
got below exception in the middle of processing and transaction did not commit hence failed to create vertices.
java.lang.IllegalStateException: Cannot access element because its enclosing transaction is closed and unbound
it happens very rare and not sure which case it will happen
can you please suggest, is there any case where janusgraph can commit/close transaction automatically?
we are explicitly opening, commiting and closing txns, so no the other place where we can close/commit in the middle of processing.