parallelNoncontendedLockStressTest failure ?


Demai <nid...@...>
 

I run the testcase several times already, and always failed on this one (sometimes under hbase98, sometimes hbase10 though)

the failure message is:
  HBaseLockStoreTest>LockKeyColumnValueStoreTest.parallelNoncontendedLockStressTest:364 expected:<100> but was:<81>

The surefire report is attached.

I suspect that some lock contention or memory issue to cause the failure. Also, I am running on a less-powerful macbook, maybe need to increase the default timeout to get all 100 threads? 

thanks for any pointers. 

Demai


Demai <nid...@...>
 

Got chance to dig more into this. It is the permanent locking causing the failure. I am not familiar with Janusgraph logic to judge whether this locking exception is expected behavior in this test scenario or not, though document 27.1. Data Consistency did warn about the robustness of it.  And I am able to consistently reproduce the failure in the past few days

LockKeyColumnValueStoreTest#LockStressor.run() tolerates TemporaryLockingException, but not PermanentLockingException

...

                } catch (TemporaryLockingException e) {

                    temporaryFailures++;

                } catch (Throwable t) {

                    log.error("Unexpected locking-related exception on iteration " + (opIndex + 1) + "/" + opCount, t);

...


The error shows either 'permanent locking failure' or 'Local lock contention', both looks like permanent locking exception from AbstractLocker#writeLock():

         ...

            } catch (TemporaryBackendException tse) {

                throw new TemporaryLockingException(tse);

  ....

            } catch (Throwable t) {

                throw new PermanentLockingException(t);

            } finally {

                ...

            }

        } else {

            // Fail immediately with no retries on local contention

            throw new PermanentLockingException("Local lock contention");




On Tuesday, September 12, 2017 at 3:03:32 PM UTC-7, Demai wrote:
I run the testcase several times already, and always failed on this one (sometimes under hbase98, sometimes hbase10 though)

the failure message is:
  HBaseLockStoreTest>LockKeyColumnValueStoreTest.parallelNoncontendedLockStressTest:364 expected:<100> but was:<81>

The surefire report is attached.

I suspect that some lock contention or memory issue to cause the failure. Also, I am running on a less-powerful macbook, maybe need to increase the default timeout to get all 100 threads? 

thanks for any pointers. 

Demai