Getting RowTooBigException while firing janus query backed by hbase


Priyanka Jindal <priyanka...@...>
 

Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


HadoopMarc <bi...@...>
 

Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:

Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


Priyanka Jindal <priyanka...@...>
 

Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


HadoopMarc <bi...@...>
 

Hi,

The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.

Best wishes,    Marc

Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:

Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


HadoopMarc <bi...@...>
 

A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.

Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:

Hi,

The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.

Best wishes,    Marc

Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:
Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


Priyanka Jindal <priyanka...@...>
 

Marc,

I have tried the query suggested by you. 

valueMap =  g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2")
But i got the following exception:

Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX

seems i need to have other index on this. Is it correct?
If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing.
 


On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote:
A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.

Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,

The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.

Best wishes,    Marc

Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:
Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


HadoopMarc <bi...@...>
 

Hi,

The .has("key1","v1").has("key2","v2") part is just taken from your original query and apparently does not relate to a real vertex. Just replace it with a single or double has() step that is consistent with the indices that are present for your graph.

HTH,    Marc

Op woensdag 12 augustus 2020 om 18:34:21 UTC+2 schreef priy...@...:

Marc,

I have tried the query suggested by you. 

valueMap =  g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2")
But i got the following exception:

Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX

seems i need to have other index on this. Is it correct?
If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing.
 

On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote:
A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.

Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,

The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.

Best wishes,    Marc

Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:
Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


HadoopMarc <bi...@...>
 

Hi,

You sent me the following additional information:
"""
So i have one graph index of composite type on Janus property.  mgmt.buildIndex("idxAllPropertiesByResourceProviderXidKey", JanusGraphVertexProperty.class)
            .addKey(resourceNameKey).addKey(providerKey).addKey(xidKey).buildCompositeIndex();

And when i fire the query to fetch properties and iterate over them with certain condition like 
 tx.query()
        .has(resourceNameKey, "resourceKey")
        .has(providerKey, "providerKey")
        .has(providerKey, "naXid")
        .limit(elementBatchSize).properties()
"""
So, the traversal should be:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has(resourceNameKey, "v1").has(providerKey, "v2").has(xidKey, "v3").valueMap("smallProperty1", "smallProperty2").toList();

Where ("v1", "v2", "v3") is a set of property values of a vertex that is known to be existing and thus can be looked up in your graph.
Where ("smallProperty1", "smallProperty2") is set set of property keys for which you want to lookup the values and in which you do not have enormous blobs that cause exceeding the row size limit.

Best wishes,   Marc
Op donderdag 13 augustus 2020 om 07:36:18 UTC+2 schreef HadoopMarc:

Hi,

The .has("key1","v1").has("key2","v2") part is just taken from your original query and apparently does not relate to a real vertex. Just replace it with a single or double has() step that is consistent with the indices that are present for your graph.

HTH,    Marc

Op woensdag 12 augustus 2020 om 18:34:21 UTC+2 schreef priy...@...:
Marc,

I have tried the query suggested by you. 

valueMap =  g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2")
But i got the following exception:

Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX

seems i need to have other index on this. Is it correct?
If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing.
 

On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote:
A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.

Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,

The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.

Best wishes,    Marc

Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:
Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that


Priyanka Jindal <priyanka...@...>
 

Hi Marc,

I am not trying to fetch the vertices. I need to fetch proerties where by querying them with meta properties.
I have one graph index of composite type on Janus property.  mgmt.buildIndex("idxAllPropertiesByResourceProviderXidKey", JanusGraphVertexProperty.class)
            .addKey(resourceNameKey).addKey(providerKey).addKey(xidKey).buildCompositeIndex();

I want to fire the query to fetch properties which have 3 meta properties resourceNameKey, providerKey, xidKey
 tx.query()
        .has(resourceNameKey, "resourceKey")
        .has(providerKey, "providerKey")
        .has(providerKey, "naXid")
        .limit(elementBatchSize).properties()
The above query should return me the properties matching the passed met property values. So how can your query help here?


On Thursday, August 13, 2020 at 4:25:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

You sent me the following additional information:
"""
So i have one graph index of composite type on Janus property.  mgmt.buildIndex("idxAllPropertiesByResourceProviderXidKey", JanusGraphVertexProperty.class)
            .addKey(resourceNameKey).addKey(providerKey).addKey(xidKey).buildCompositeIndex();

And when i fire the query to fetch properties and iterate over them with certain condition like 
 tx.query()
        .has(resourceNameKey, "resourceKey")
        .has(providerKey, "providerKey")
        .has(providerKey, "naXid")
        .limit(elementBatchSize).properties()
"""
So, the traversal should be:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has(resourceNameKey, "v1").has(providerKey, "v2").has(xidKey, "v3").valueMap("smallProperty1", "smallProperty2").toList();

Where ("v1", "v2", "v3") is a set of property values of a vertex that is known to be existing and thus can be looked up in your graph.
Where ("smallProperty1", "smallProperty2") is set set of property keys for which you want to lookup the values and in which you do not have enormous blobs that cause exceeding the row size limit.

Best wishes,   Marc
Op donderdag 13 augustus 2020 om 07:36:18 UTC+2 schreef HadoopMarc:
Hi,

The .has("key1","v1").has("key2","v2") part is just taken from your original query and apparently does not relate to a real vertex. Just replace it with a single or double has() step that is consistent with the indices that are present for your graph.

HTH,    Marc

Op woensdag 12 augustus 2020 om 18:34:21 UTC+2 schreef priy...@...:
Marc,

I have tried the query suggested by you. 

valueMap =  g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2")
But i got the following exception:

Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX

seems i need to have other index on this. Is it correct?
If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing.
 

On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote:
A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.

Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,

The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.

Best wishes,    Marc

Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:
Hi HadoopMarc,

Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.


On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote:
Hi,

Once you have your multithreaded transaction, you can create a TraversalSource from that:

threadedGraph = graph.tx().createThreadedTx();
g = threadGraph.traversal();
g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();

Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.

Best wishes,   Marc

Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi 

In my case janusgraph storage is backed by hbase and i am firing the following query:

query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE)
for(T element : query.properties()){

}

It contains the composite index. vertex is not partitioned.

So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.

As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.

So my question is:
1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs?
2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that