Hi dev team, The JanusGraph users list has seen a number of threads regarding OLAP performance with janusgraph-hbase. In particular, it turns out that initial loading of a graph is problematic when the Hbase table is stored in a small number of large regions of say 10Gb. Such large region sizes result in optimal performance of HBase, so system managers are not expected to like HBase backed graphs with many small regions needed for good parellelism during OLAP operations. As a result, HBase 2.0 alpha has introduced a mappers.per.region option to TableInputFormatBase which allows a single region to be spread over multiple mappers cq Spark tasks. Anxious to use this feature before HBase 2.0 and a JG version supporting it, will come out, I made a quick attempt to backport the feature. This turns out to be quite doable, see: https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1. This is initial work and I plan to do a performance benchmark with the friendster graph, like the TinkerPop team did. My questions to you: - would this work be welcomed as a JanusGraph PR before a release based on HBase 2.0 comes out?
- if so, do you have any suggestions to improve on the work?
Some additional notes: - SparkGraphComputer has an option to repartition the graph using the workers() method of the GraphComputer builder, but this does not help in a better parallelization of the initial load
- The current HBaseInputFormat has a rather intricate inheritance structure, which will probably need rigorous refactoring to use the HBase 2.0 TableInputFormatBase
Cheers, Marc
|
|
Can’t make a release on a snapshot. Do they have a pre-release release? We already have one dep on a rc1 release.
toggle quoted message
Show quoted text
On Sun, Oct 22, 2017 at 18:29 HadoopMarc < bi...@...> wrote: Hi dev team, The JanusGraph users list has seen a number of threads regarding OLAP performance with janusgraph-hbase. In particular, it turns out that initial loading of a graph is problematic when the Hbase table is stored in a small number of large regions of say 10Gb. Such large region sizes result in optimal performance of HBase, so system managers are not expected to like HBase backed graphs with many small regions needed for good parellelism during OLAP operations. As a result, HBase 2.0 alpha has introduced a mappers.per.region option to TableInputFormatBase which allows a single region to be spread over multiple mappers cq Spark tasks. Anxious to use this feature before HBase 2.0 and a JG version supporting it, will come out, I made a quick attempt to backport the feature. This turns out to be quite doable, see: https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1. This is initial work and I plan to do a performance benchmark with the friendster graph, like the TinkerPop team did. My questions to you: - would this work be welcomed as a JanusGraph PR before a release based on HBase 2.0 comes out?
- if so, do you have any suggestions to improve on the work?
Some additional notes: - SparkGraphComputer has an option to repartition the graph using the workers() method of the GraphComputer builder, but this does not help in a better parallelization of the initial load
- The current HBaseInputFormat has a rather intricate inheritance structure, which will probably need rigorous refactoring to use the HBase 2.0 TableInputFormatBase
Cheers, Marc
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc87971b-664c-4b0b-961a-aef593d9fb40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
HadoopMarc <marc.d...@...>
Hi Robert,
I see I caused some confusion. The link I sent does not have any deps on HBase 2.0, it only copied/back-ported a small bit of HBase 2.0 code that becomes part of JanusGraph and is tested as such. Once HBase 2.0 becomes available, the back-port would vanish from JanusGraph again for the release branch that depends on HBase 2.0. So my question is whether there would be support for this back-ported feature on the 0.2.X and 0.3.X branches which depend on HBase 1.Y.
Cheers, Marc
Op maandag 23 oktober 2017 05:55:01 UTC+2 schreef Robert Dale:
toggle quoted message
Show quoted text
Can’t make a release on a snapshot. Do they have a pre-release release? We already have one dep on a rc1 release. On Sun, Oct 22, 2017 at 18:29 HadoopMarc < b...@...> wrote: Hi dev team, The JanusGraph users list has seen a number of threads regarding OLAP performance with janusgraph-hbase. In particular, it turns out that initial loading of a graph is problematic when the Hbase table is stored in a small number of large regions of say 10Gb. Such large region sizes result in optimal performance of HBase, so system managers are not expected to like HBase backed graphs with many small regions needed for good parellelism during OLAP operations. As a result, HBase 2.0 alpha has introduced a mappers.per.region option to TableInputFormatBase which allows a single region to be spread over multiple mappers cq Spark tasks. Anxious to use this feature before HBase 2.0 and a JG version supporting it, will come out, I made a quick attempt to backport the feature. This turns out to be quite doable, see: https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1. This is initial work and I plan to do a performance benchmark with the friendster graph, like the TinkerPop team did. My questions to you: - would this work be welcomed as a JanusGraph PR before a release based on HBase 2.0 comes out?
- if so, do you have any suggestions to improve on the work?
Some additional notes: - SparkGraphComputer has an option to repartition the graph using the workers() method of the GraphComputer builder, but this does not help in a better parallelization of the initial load
- The current HBaseInputFormat has a rather intricate inheritance structure, which will probably need rigorous refactoring to use the HBase 2.0 TableInputFormatBase
Cheers, Marc
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc87971b-664c-4b0b-961a-aef593d9fb40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Robert Dale
|
|
I think it is a useful feature to have. See the JIRA for more details: https://issues.apache.org/jira/browse/HBASE-16894On the other hand, it is about timing. It is very likely that the next release of JanusGraph (say 3 month from now) will be close to either HBase 2.0 or HBase 1.4 (which also contains the fix). Then we will have a code backport that may quickly becomes duplicate. THanks,
toggle quoted message
Show quoted text
On Mon, Oct 23, 2017 at 3:01 AM, HadoopMarc <marc.d...@...> wrote: Hi Robert,
I see I caused some confusion. The link I sent does not have any deps on HBase 2.0, it only copied/back-ported a small bit of HBase 2.0 code that becomes part of JanusGraph and is tested as such. Once HBase 2.0 becomes available, the back-port would vanish from JanusGraph again for the release branch that depends on HBase 2.0. So my question is whether there would be support for this back-ported feature on the 0.2.X and 0.3.X branches which depend on HBase 1.Y.
Cheers, Marc
Op maandag 23 oktober 2017 05:55:01 UTC+2 schreef Robert Dale:
Can’t make a release on a snapshot. Do they have a pre-release release? We already have one dep on a rc1 release.
On Sun, Oct 22, 2017 at 18:29 HadoopMarc <b...@...> wrote:
Hi dev team,
The JanusGraph users list has seen a number of threads regarding OLAP performance with janusgraph-hbase. In particular, it turns out that initial loading of a graph is problematic when the Hbase table is stored in a small number of large regions of say 10Gb. Such large region sizes result in optimal performance of HBase, so system managers are not expected to like HBase backed graphs with many small regions needed for good parellelism during OLAP operations. As a result, HBase 2.0 alpha has introduced a mappers.per.region option to TableInputFormatBase which allows a single region to be spread over multiple mappers cq Spark tasks. Anxious to use this feature before HBase 2.0 and a JG version supporting it, will come out, I made a quick attempt to backport the feature. This turns out to be quite doable, see: https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1. This is initial work and I plan to do a performance benchmark with the friendster graph, like the TinkerPop team did.
My questions to you:
would this work be welcomed as a JanusGraph PR before a release based on HBase 2.0 comes out? if so, do you have any suggestions to improve on the work?
Some additional notes:
SparkGraphComputer has an option to repartition the graph using the workers() method of the GraphComputer builder, but this does not help in a better parallelization of the initial load The current HBaseInputFormat has a rather intricate inheritance structure, which will probably need rigorous refactoring to use the HBase 2.0 TableInputFormatBase
Cheers, Marc
-- You received this message because you are subscribed to the Google Groups "JanusGraph developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusg...@.... To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc87971b-664c-4b0b-961a-aef593d9fb40%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Robert Dale -- You received this message because you are subscribed to the Google Groups "JanusGraph developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@.... To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fe315e19-343f-4b99-9a92-4786ac5b3c8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
Hi Jerry,
Thanks for the info about the HBase-1.4 branch, I was not aware of that. I agree then that it is better to focus our efforts to have JanusGraph HBaseInputFormat inherit from HBase InputTableFormatBase. I will do the benchmark test with my current code anyway, to see if performance works out as expected.
Marc
Op maandag 23 oktober 2017 19:30:25 UTC+2 schreef Jerry He:
toggle quoted message
Show quoted text
I think it is a useful feature to have. See the JIRA for more
details: https://issues.apache.org/jira/browse/HBASE-16894
On the other hand, it is about timing. It is very likely that the
next release of JanusGraph (say 3 month from now) will be close to
either HBase 2.0 or HBase 1.4 (which also contains the fix). Then we
will have a code backport that may quickly becomes duplicate.
THanks,
On Mon, Oct 23, 2017 at 3:01 AM, HadoopMarc <mar...@...> wrote:
> Hi Robert,
>
> I see I caused some confusion. The link I sent does not have any deps on
> HBase 2.0, it only copied/back-ported a small bit of HBase 2.0 code that
> becomes part of JanusGraph and is tested as such. Once HBase 2.0 becomes
> available, the back-port would vanish from JanusGraph again for the release
> branch that depends on HBase 2.0.
> So my question is whether there would be support for this back-ported
> feature on the 0.2.X and 0.3.X branches which depend on HBase 1.Y.
>
> Cheers, Marc
>
> Op maandag 23 oktober 2017 05:55:01 UTC+2 schreef Robert Dale:
>>
>> Can’t make a release on a snapshot. Do they have a pre-release release?
>> We already have one dep on a rc1 release.
>>
>> On Sun, Oct 22, 2017 at 18:29 HadoopMarc <b...@...> wrote:
>>>
>>> Hi dev team,
>>>
>>> The JanusGraph users list has seen a number of threads regarding OLAP
>>> performance with janusgraph-hbase. In particular, it turns out that initial
>>> loading of a graph is problematic when the Hbase table is stored in a small
>>> number of large regions of say 10Gb. Such large region sizes result in
>>> optimal performance of HBase, so system managers are not expected to like
>>> HBase backed graphs with many small regions needed for good parellelism
>>> during OLAP operations. As a result, HBase 2.0 alpha has introduced a
>>> mappers.per.region option to TableInputFormatBase which allows a single
>>> region to be spread over multiple mappers cq Spark tasks. Anxious to use
>>> this feature before HBase 2.0 and a JG version supporting it, will come out,
>>> I made a quick attempt to backport the feature. This turns out to be quite
>>> doable, see:
>>> https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1.
>>> This is initial work and I plan to do a performance benchmark with the
>>> friendster graph, like the TinkerPop team did.
>>>
>>> My questions to you:
>>>
>>> would this work be welcomed as a JanusGraph PR before a release based on
>>> HBase 2.0 comes out?
>>> if so, do you have any suggestions to improve on the work?
>>>
>>>
>>> Some additional notes:
>>>
>>> SparkGraphComputer has an option to repartition the graph using the
>>> workers() method of the GraphComputer builder, but this does not help in a
>>> better parallelization of the initial load
>>> The current HBaseInputFormat has a rather intricate inheritance
>>> structure, which will probably need rigorous refactoring to use the HBase
>>> 2.0 TableInputFormatBase
>>>
>>> Cheers, Marc
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "JanusGraph developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to janusgraph-de...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/janusgraph-dev/fc87971b-664c-4b0b-961a-aef593d9fb40%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Robert Dale
>
> --
> You received this message because you are subscribed to the Google Groups
> "JanusGraph developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to janusgraph-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/janusgraph-dev/fe315e19-343f-4b99-9a92-4786ac5b3c8c%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
|
|
That will be a very nice performance testing!
Thanks.
toggle quoted message
Show quoted text
On Mon, Oct 23, 2017 at 10:45 PM, HadoopMarc <bi...@...> wrote: Hi Jerry,
Thanks for the info about the HBase-1.4 branch, I was not aware of that. I agree then that it is better to focus our efforts to have JanusGraph HBaseInputFormat inherit from HBase InputTableFormatBase. I will do the benchmark test with my current code anyway, to see if performance works out as expected.
Marc
Op maandag 23 oktober 2017 19:30:25 UTC+2 schreef Jerry He:
I think it is a useful feature to have. See the JIRA for more details: https://issues.apache.org/jira/browse/HBASE-16894 On the other hand, it is about timing. It is very likely that the next release of JanusGraph (say 3 month from now) will be close to either HBase 2.0 or HBase 1.4 (which also contains the fix). Then we will have a code backport that may quickly becomes duplicate.
THanks,
On Mon, Oct 23, 2017 at 3:01 AM, HadoopMarc <mar...@...> wrote:
Hi Robert,
I see I caused some confusion. The link I sent does not have any deps on HBase 2.0, it only copied/back-ported a small bit of HBase 2.0 code that becomes part of JanusGraph and is tested as such. Once HBase 2.0 becomes available, the back-port would vanish from JanusGraph again for the release branch that depends on HBase 2.0. So my question is whether there would be support for this back-ported feature on the 0.2.X and 0.3.X branches which depend on HBase 1.Y.
Cheers, Marc
Op maandag 23 oktober 2017 05:55:01 UTC+2 schreef Robert Dale:
Can’t make a release on a snapshot. Do they have a pre-release release? We already have one dep on a rc1 release.
On Sun, Oct 22, 2017 at 18:29 HadoopMarc <b...@...> wrote:
Hi dev team,
The JanusGraph users list has seen a number of threads regarding OLAP performance with janusgraph-hbase. In particular, it turns out that initial loading of a graph is problematic when the Hbase table is stored in a small number of large regions of say 10Gb. Such large region sizes result in optimal performance of HBase, so system managers are not expected to like HBase backed graphs with many small regions needed for good parellelism during OLAP operations. As a result, HBase 2.0 alpha has introduced a mappers.per.region option to TableInputFormatBase which allows a single region to be spread over multiple mappers cq Spark tasks. Anxious to use this feature before HBase 2.0 and a JG version supporting it, will come out, I made a quick attempt to backport the feature. This turns out to be quite doable, see:
https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1. This is initial work and I plan to do a performance benchmark with the friendster graph, like the TinkerPop team did.
My questions to you:
would this work be welcomed as a JanusGraph PR before a release based on HBase 2.0 comes out? if so, do you have any suggestions to improve on the work?
Some additional notes:
SparkGraphComputer has an option to repartition the graph using the workers() method of the GraphComputer builder, but this does not help in a better parallelization of the initial load The current HBaseInputFormat has a rather intricate inheritance structure, which will probably need rigorous refactoring to use the HBase 2.0 TableInputFormatBase
Cheers, Marc
-- You received this message because you are subscribed to the Google Groups "JanusGraph developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusg...@.... To view this discussion on the web visit
https://groups.google.com/d/msgid/janusgraph-dev/fc87971b-664c-4b0b-961a-aef593d9fb40%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Robert Dale -- You received this message because you are subscribed to the Google Groups "JanusGraph developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusg...@.... To view this discussion on the web visit
https://groups.google.com/d/msgid/janusgraph-dev/fe315e19-343f-4b99-9a92-4786ac5b3c8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "JanusGraph developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@.... To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/f2159ac2-beb1-4be1-a843-209e52648e77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|