Priyanka Jindal <priyanka...@...>
Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
toggle quoted message
Show quoted text
Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Priyanka Jindal <priyanka...@...>
Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException.
toggle quoted message
Show quoted text
On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...:
Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Hi,
The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.
Best wishes, Marc
Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...:
toggle quoted message
Show quoted text
Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException. On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...: Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties. Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
toggle quoted message
Show quoted text
Hi,
The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.
Best wishes, Marc
Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...: Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException. On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...: Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Priyanka Jindal <priyanka...@...>
Marc,
I have tried the query suggested by you.
valueMap = g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2") But i got the following exception:
Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX
seems i need to have other index on this. Is it correct? If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing.
toggle quoted message
Show quoted text
On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote: A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.
Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,
The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.
Best wishes, Marc
Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...: Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException. On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...: Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Hi,
The .has("key1","v1").has("key2","v2") part is just taken from your original query and apparently does not relate to a real vertex. Just replace it with a single or double has() step that is consistent with the indices that are present for your graph.
HTH, Marc
Op woensdag 12 augustus 2020 om 18:34:21 UTC+2 schreef priy...@...:
toggle quoted message
Show quoted text
Marc,
I have tried the query suggested by you.
valueMap = g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2") But i got the following exception:
Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX
seems i need to have other index on this. Is it correct? If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing. On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote: A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.
Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,
The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.
Best wishes, Marc
Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...: Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException. On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...: Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Hi,
You sent me the following additional information: """
So i have one graph index of composite type on Janus property.
mgmt.buildIndex("idxAllPropertiesByResourceProviderXidKey",
JanusGraphVertexProperty.class) .addKey(resourceNameKey).addKey(providerKey).addKey(xidKey).buildCompositeIndex();
And when i fire the query to fetch properties and iterate over them with certain condition like tx.query() .has(resourceNameKey, "resourceKey") .has(providerKey, "providerKey") .has(providerKey, "naXid") .limit(elementBatchSize).properties() """
So, the traversal should be:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has(resourceNameKey, "v1").has(providerKey, "v2").has(xidKey, "v3").valueMap("smallProperty1", "smallProperty2").toList();
Where ("v1", "v2", "v3") is a set of property values of a vertex that is known to be existing and thus can be looked up in your graph. Where ("smallProperty1", "smallProperty2") is set set of property keys for which you want to lookup the values and in which you do not have enormous blobs that cause exceeding the row size limit.
Best wishes, Marc
Op donderdag 13 augustus 2020 om 07:36:18 UTC+2 schreef HadoopMarc:
toggle quoted message
Show quoted text
Hi,
The .has("key1","v1").has("key2","v2") part is just taken from your original query and apparently does not relate to a real vertex. Just replace it with a single or double has() step that is consistent with the indices that are present for your graph.
HTH, Marc
Op woensdag 12 augustus 2020 om 18:34:21 UTC+2 schreef priy...@...: Marc,
I have tried the query suggested by you.
valueMap = g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2") But i got the following exception:
Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX
seems i need to have other index on this. Is it correct? If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing. On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote: A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.
Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,
The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.
Best wishes, Marc
Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...: Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException. On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...: Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|
Priyanka Jindal <priyanka...@...>
Hi Marc,
I am not trying to fetch the vertices. I need to fetch proerties where by querying them with meta properties. I have one graph index of composite type on Janus property. mgmt.buildIndex(" idxAllPropertiesByResourceProv iderXidKey", JanusGraphVertexProperty. class) .addKey(resourceNameKey).addKey(providerKey).addKey(xidKey).buildCompositeIndex();
I want to fire the query to fetch properties which have 3 meta properties resourceNameKey, providerKey, xidKey tx.query() .has(resourceNameKey, "resourceKey") .has(providerKey, "providerKey") .has(providerKey, "naXid") .limit(elementBatchSize).properties() The above query should return me the properties matching the passed met property values. So how can your query help here?
toggle quoted message
Show quoted text
On Thursday, August 13, 2020 at 4:25:28 PM UTC+5:30, HadoopMarc wrote: Hi,
You sent me the following additional information: """
So i have one graph index of composite type on Janus property.
mgmt.buildIndex(" idxAllPropertiesByResourceProv iderXidKey",
JanusGraphVertexProperty. class) .addKey(resourceNameKey).addKey(providerKey).addKey(xidKey).buildCompositeIndex();
And when i fire the query to fetch properties and iterate over them with certain condition like tx.query() .has(resourceNameKey, "resourceKey") .has(providerKey, "providerKey") .has(providerKey, "naXid") .limit(elementBatchSize).properties() """
So, the traversal should be:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has(resourceNameKey, "v1").has(providerKey, "v2").has(xidKey, "v3").valueMap("smallProperty1", "smallProperty2").toList();
Where ("v1", "v2", "v3") is a set of property values of a vertex that is known to be existing and thus can be looked up in your graph. Where ("smallProperty1", "smallProperty2") is set set of property keys for which you want to lookup the values and in which you do not have enormous blobs that cause exceeding the row size limit.
Best wishes, Marc
Op donderdag 13 augustus 2020 om 07:36:18 UTC+2 schreef HadoopMarc:
Hi,
The .has("key1","v1").has("key2","v2") part is just taken from your original query and apparently does not relate to a real vertex. Just replace it with a single or double has() step that is consistent with the indices that are present for your graph.
HTH, Marc
Op woensdag 12 augustus 2020 om 18:34:21 UTC+2 schreef priy...@...: Marc,
I have tried the query suggested by you.
valueMap = g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2") But i got the following exception:
Could not find a suitable index to answer graph query and graph scans are disabled: [(key1= v1 AND key2= v2 AND _xidKey = 0)]:VERTEX
seems i need to have other index on this. Is it correct? If yes, cannot the problem be solved with existing index itself. Because if i create a new index it will require reindexing. On Wednesday, August 12, 2020 at 2:46:02 PM UTC+5:30, HadoopMarc wrote: A bit more explicit: The difference lies in the scan queries that janusgraph fires towards the HBase storage backend: all properties vs a selection of properties.
Op woensdag 12 augustus 2020 om 11:12:50 UTC+2 schreef HadoopMarc:
Hi,
The has() steps only filter the vertices that are returned. The valueMap() step filters the list of properties per vertex. Your query does not have the equivalent of the valueMap() step: the properties() step in your query returns the entire list of properties per vertex.
Best wishes, Marc
Op woensdag 12 augustus 2020 om 09:01:12 UTC+2 schreef priy...@...: Hi HadoopMarc,
Thanks for the reply. But could please explain how this is different from the above query i posted. What this query does internally such that it avoid rowTooBigException. On Tuesday, August 11, 2020 at 7:55:28 PM UTC+5:30, HadoopMarc wrote: Hi,
Once you have your multithreaded transaction, you can create a TraversalSource from that:
threadedGraph = graph.tx().createThreadedTx(); g = threadGraph.traversal(); g.V().has("key1","v1").has("key2","v2").valueMap("key1", "key2").toList();
Using the TraversalSource you can specify which properties to return and avoid exceeding the row limit.
Best wishes, Marc
Op dinsdag 11 augustus 2020 om 05:45:30 UTC+2 schreef priy...@...: Hi
In my case janusgraph storage is backed by hbase and i am firing the following query:
query = tx.query().has("key1","v1").has("key2","v2").limit(FIXED_VALUE) for(T element : query.properties()){
}
It contains the composite index. vertex is not partitioned.
So now while calling query.properties() it fails with "org.apache.hadoop.hbase.regionserver.RowTooBigException: org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that." exception from hbase side.
As per my understanding, the reason for this is : the row which is getting fetched from hbase is greater than the threshold value set by hbase. And a row corresponds to the edges and properties of single vertex.
So my question is: 1. When the above query is invoked, how exactly it works? Does it try to fetch all the vertices and then filter depending on passed key value pairs? 2. I have tried increasing value of threshold value of hbase, but even then getting the same error. What could be the reason for that
|
|