new contribution - production-worthy in-memory backend with no external dependencies


Dmitry Kovalev <dk.g...@...>
 

Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue
https://github.com/JanusGraph/janusgraph/issues/1482

The initial PR is here:
 https://github.com/JanusGraph/janusgraph/pull/1483
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry


Chris Hupman <chris...@...>
 

I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry


Dmitry Kovalev <dk.g...@...>
 

Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Dmitry Kovalev <dk.g...@...>
 

Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

1) a separate new backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/82467c2e16388741ea87bbd8ee02ab6b3038ea47
2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To post to this group, send email to janusgraph-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Dmitry Kovalev <dk.g...@...>
 

Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <dk.g...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Henry Saputra <henry....@...>
 

Thanks, Dmitry. Please allow us some time for the review. Quick question, there are bunch of TODO in the PR, do you plan to address it soon or is it currently trade-off between time and initial implementation for the solution?

Thanks,

Henry

On Thu, Mar 28, 2019 at 10:05 AM Dmitry Kovalev <dk.g...@...> wrote:
Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <dk.g...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CAJ6znMhPX4TVrifM8Do4VaSZH%3D3uNWnuZLYQUtp4Q9qYDeyp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Dmitry Kovalev <dk.g...@...>
 

Hi Henry,

Thanks for coming back on this. 

I had a quick look at the todos right now, and I think they basically can all be removed - which I will do early next week when I come back from travelling. 

They mostly were some optimisation/efficiency ideas left there during development, to explore later based on profiling on real data, and proved to be irrelevant so far (after a few years of production use). They were left in the internal codebase as potentially useful leads for the future, but I agree that they are confusing in this context and I should have removed them during the initial cleanup for this contribution.

Implementing them at this point would just add complexity to the code without measurable benefits, unless someone comes up with a data set where profiling suggests it will make a difference. 


Many thanks, 
Dmitry 


On Thu, 28 Mar 2019, 18:32 Henry Saputra, <henry....@...> wrote:
Thanks, Dmitry. Please allow us some time for the review. Quick question, there are bunch of TODO in the PR, do you plan to address it soon or is it currently trade-off between time and initial implementation for the solution?

Thanks,

Henry

On Thu, Mar 28, 2019 at 10:05 AM Dmitry Kovalev <dk.g...@...> wrote:
Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <dk.g...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CAJ6znMhPX4TVrifM8Do4VaSZH%3D3uNWnuZLYQUtp4Q9qYDeyp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CALuGr6ZQkOvcdcgW%3DUkx71RZyta9M9VTPrzas0OxN7Ra-LiGFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Dmitry Kovalev <dk.g...@...>
 

Just pushed a commit to remove/reclassify all TODOs in the code.


On Thursday, 28 March 2019 20:17:22 UTC, Dmitry Kovalev wrote:
Hi Henry,

Thanks for coming back on this. 

I had a quick look at the todos right now, and I think they basically can all be removed - which I will do early next week when I come back from travelling. 

They mostly were some optimisation/efficiency ideas left there during development, to explore later based on profiling on real data, and proved to be irrelevant so far (after a few years of production use). They were left in the internal codebase as potentially useful leads for the future, but I agree that they are confusing in this context and I should have removed them during the initial cleanup for this contribution.

Implementing them at this point would just add complexity to the code without measurable benefits, unless someone comes up with a data set where profiling suggests it will make a difference. 


Many thanks, 
Dmitry 


On Thu, 28 Mar 2019, 18:32 Henry Saputra, <henry....@...> wrote:
Thanks, Dmitry. Please allow us some time for the review. Quick question, there are bunch of TODO in the PR, do you plan to address it soon or is it currently trade-off between time and initial implementation for the solution?

Thanks,

Henry

On Thu, Mar 28, 2019 at 10:05 AM Dmitry Kovalev <dk.g...@...> wrote:
Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <dk.g...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To post to this group, send email to janusgraph-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To post to this group, send email to janusgraph-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To post to this group, send email to janusgraph-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CAJ6znMhPX4TVrifM8Do4VaSZH%3D3uNWnuZLYQUtp4Q9qYDeyp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To post to this group, send email to janusgraph-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CALuGr6ZQkOvcdcgW%3DUkx71RZyta9M9VTPrzas0OxN7Ra-LiGFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Henry Saputra <henry....@...>
 

Thanks, Dmitry. 

Looks like are already several comments for the PR for the review, could start addressing them to help the review.

Thanks,

Henry


On Wed, Apr 3, 2019 at 5:05 AM Dmitry Kovalev <dk.g...@...> wrote:
Just pushed a commit to remove/reclassify all TODOs in the code.

On Thursday, 28 March 2019 20:17:22 UTC, Dmitry Kovalev wrote:
Hi Henry,

Thanks for coming back on this. 

I had a quick look at the todos right now, and I think they basically can all be removed - which I will do early next week when I come back from travelling. 

They mostly were some optimisation/efficiency ideas left there during development, to explore later based on profiling on real data, and proved to be irrelevant so far (after a few years of production use). They were left in the internal codebase as potentially useful leads for the future, but I agree that they are confusing in this context and I should have removed them during the initial cleanup for this contribution.

Implementing them at this point would just add complexity to the code without measurable benefits, unless someone comes up with a data set where profiling suggests it will make a difference. 


Many thanks, 
Dmitry 


On Thu, 28 Mar 2019, 18:32 Henry Saputra, <henry....@...> wrote:
Thanks, Dmitry. Please allow us some time for the review. Quick question, there are bunch of TODO in the PR, do you plan to address it soon or is it currently trade-off between time and initial implementation for the solution?

Thanks,

Henry

On Thu, Mar 28, 2019 at 10:05 AM Dmitry Kovalev <dk.g...@...> wrote:
Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <dk.g...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <chris...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CAJ6znMhPX4TVrifM8Do4VaSZH%3D3uNWnuZLYQUtp4Q9qYDeyp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CALuGr6ZQkOvcdcgW%3DUkx71RZyta9M9VTPrzas0OxN7Ra-LiGFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To post to this group, send email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/2877bbe5-8c5f-4a5b-9aea-e4dbbec16af9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Dmitry Kovalev <dk.g...@...>
 

Hi All,


thank you for all the comments so far on the issue, PR and in the dev maillist. Unfortunately I was not able to obtain from these comments a clear understanding of how to proceed with this contribution, and there doesn't seem to be a lot of traffic on it in general. I would really like to get a better understanding of where we stand in general before I proceed to address the cosmetic/style/refactoring comments.


So I have collated a few options based on the feedback so far, and I would like to ask everyone, especially the previous commenters and also preferably active developers/TSC members to look through them and cast their "vote" on the issue page:


https://github.com/JanusGraph/janusgraph/issues/1482#issuecomment-536026375


Hopefully the fact that one just has to choose an option will increase participation :), but if you have more to say on the matter than just choose from the options, then by all means please do so.


Thank you,
Dmitry


On Wednesday, 10 April 2019 06:43:16 UTC+1, Henry Saputra wrote:
Thanks, Dmitry. 

Looks like are already several comments for the PR for the review, could start addressing them to help the review.

Thanks,

Henry

On Wed, Apr 3, 2019 at 5:05 AM Dmitry Kovalev <d...@...> wrote:
Just pushed a commit to remove/reclassify all TODOs in the code.

On Thursday, 28 March 2019 20:17:22 UTC, Dmitry Kovalev wrote:
Hi Henry,

Thanks for coming back on this. 

I had a quick look at the todos right now, and I think they basically can all be removed - which I will do early next week when I come back from travelling. 

They mostly were some optimisation/efficiency ideas left there during development, to explore later based on profiling on real data, and proved to be irrelevant so far (after a few years of production use). They were left in the internal codebase as potentially useful leads for the future, but I agree that they are confusing in this context and I should have removed them during the initial cleanup for this contribution.

Implementing them at this point would just add complexity to the code without measurable benefits, unless someone comes up with a data set where profiling suggests it will make a difference. 


Many thanks, 
Dmitry 


On Thu, 28 Mar 2019, 18:32 Henry Saputra, <he...@...> wrote:
Thanks, Dmitry. Please allow us some time for the review. Quick question, there are bunch of TODO in the PR, do you plan to address it soon or is it currently trade-off between time and initial implementation for the solution?

Thanks,

Henry

On Thu, Mar 28, 2019 at 10:05 AM Dmitry Kovalev <d...@...> wrote:
Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <d...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <ch...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CAJ6znMhPX4TVrifM8Do4VaSZH%3D3uNWnuZLYQUtp4Q9qYDeyp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CALuGr6ZQkOvcdcgW%3DUkx71RZyta9M9VTPrzas0OxN7Ra-LiGFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/2877bbe5-8c5f-4a5b-9aea-e4dbbec16af9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Jan Jansen <faro...@...>
 

Hi Dmitry,

I would like to have production ready inmemory backend. If you look into the PR, you see that a lot of work has put into the code.
To review completely, you have to understand why something has done this way and not another way. I don't think it is possible review completely in near future,
but for me it is okay to merge PR check the code style, all tests including tinkerpop tests are passed and we put the code into an extra project.

Greetings,
Jan

Am Freitag, 27. September 2019 19:28:36 UTC+2 schrieb Dmitry Kovalev:

Hi All,


thank you for all the comments so far on the issue, PR and in the dev maillist. Unfortunately I was not able to obtain from these comments a clear understanding of how to proceed with this contribution, and there doesn't seem to be a lot of traffic on it in general. I would really like to get a better understanding of where we stand in general before I proceed to address the cosmetic/style/refactoring comments.


So I have collated a few options based on the feedback so far, and I would like to ask everyone, especially the previous commenters and also preferably active developers/TSC members to look through them and cast their "vote" on the issue page:


https://github.com/JanusGraph/janusgraph/issues/1482#issuecomment-536026375


Hopefully the fact that one just has to choose an option will increase participation :), but if you have more to say on the matter than just choose from the options, then by all means please do so.


Thank you,
Dmitry

On Wednesday, 10 April 2019 06:43:16 UTC+1, Henry Saputra wrote:
Thanks, Dmitry. 

Looks like are already several comments for the PR for the review, could start addressing them to help the review.

Thanks,

Henry

On Wed, Apr 3, 2019 at 5:05 AM Dmitry Kovalev <d...@...> wrote:
Just pushed a commit to remove/reclassify all TODOs in the code.

On Thursday, 28 March 2019 20:17:22 UTC, Dmitry Kovalev wrote:
Hi Henry,

Thanks for coming back on this. 

I had a quick look at the todos right now, and I think they basically can all be removed - which I will do early next week when I come back from travelling. 

They mostly were some optimisation/efficiency ideas left there during development, to explore later based on profiling on real data, and proved to be irrelevant so far (after a few years of production use). They were left in the internal codebase as potentially useful leads for the future, but I agree that they are confusing in this context and I should have removed them during the initial cleanup for this contribution.

Implementing them at this point would just add complexity to the code without measurable benefits, unless someone comes up with a data set where profiling suggests it will make a difference. 


Many thanks, 
Dmitry 


On Thu, 28 Mar 2019, 18:32 Henry Saputra, <he...@...> wrote:
Thanks, Dmitry. Please allow us some time for the review. Quick question, there are bunch of TODO in the PR, do you plan to address it soon or is it currently trade-off between time and initial implementation for the solution?

Thanks,

Henry

On Thu, Mar 28, 2019 at 10:05 AM Dmitry Kovalev <d...@...> wrote:
Just a gentle reminder - please could anyone spare some time to review?

Thank you, 
Dmitry 

On Fri, 15 Mar 2019, 01:36 Dmitry Kovalev, <d...@...> wrote:
Hi All,

This is done now. Right now the PR consists of 3 commits (will squash later once we have a decision):

2) same backend refactored as in-place change of existing in-memory backend: https://github.com/JanusGraph/janusgraph/pull/1483/commits/07112636a4f753532e23e68cd3db624d72ba0ff3
3) a bunch of (completely irrelevant) changes required to resolve "issues" raised by Codacy: https://github.com/JanusGraph/janusgraph/pull/1483/commits/a7554ea766eea8a792eb44d730ee7928c0d6828c

I would be most grateful for
a) any opinions/suggestions/approvals or otherwise on the actual design and implementation
b) more votes on the matter of separate backend vs changing the existing one
c) finally, some indication as to which of the remaining Codacy "issues" need to be actually addressed

Many thanks,
Dmitry


On Wednesday, 13 March 2019 22:39:12 UTC, Dmitry Kovalev wrote:
Hi Chris,

thanks for the reply. Sure, I will refactor the initial commit into an update of the existing backend, and push it as a second commit on the same PR. This should leave both variants easily accessible within the same PR and people can have a look at both and make an informed decision. Hope to find time for this tomorrow.

However if you already had some time allocated for the review - I would urge you to go ahead and look at what's there already, because the difference is going to be mostly in package names etc, and only a couple of classes would become changes to existing ones.

As for the time to review - sure, I appreciate that it is no one's full-time job and can take a while. At the same time, I would say it's not that big or complex actually, and all that is probably needed from any one reviewer is a couple of hours set aside for it. So I will take this as another opportunity to gently invite other reviewers to have a look as well :)

Any questions or anything I can do to make it easier - let me know here or in PM. Happy to do a quick call as well if that helps.

Thanks,
Dmitry

On Wed, 13 Mar 2019 at 20:25, Chris Hupman <ch...@...> wrote:
I haven't really looked threw the PR much yet, but from your comments I would lean towards updating the existing backend, assuming the test suite doesn't see a performance hit. 

Also, thank you for the contribution. You probably already know this, but given the size of the PR I anticipate it'll take a good chunk of time to properly review. As a result I would like to pre-emptively ask you to be patient with the process. 

Regards,

Chris

On Thursday, March 7, 2019 at 10:53:00 AM UTC-8, Dmitry Kovalev wrote:
Hi JanusGraph team,

We would like to contribute an improved implementation of in-memory backend for JanusGraph.
 
Background/Rationale

There are many possible applications for both embedded and standalone JanusGraph with a 100% in-memory backend – i.e. cases where the graph can potentially fit within the boundaries of one JVM, is built dynamically and in parallel transactions, and the performance during build-out of the graph and querying it is critical. A quick search in the issues and mailing list seems to confirm that there is general interest in this kind of use case, both for JanusGraph specifically and for an open-source in-memory graph in general (e.g. TinkerGraph may not fit the bill in all cases as it is not transactional etc).
 
However there seems to be a bit of a gap in JanusGraph offering in this space. Current implementation of JanusGraph’s in-memory backend has not changed since Titan times, and is still declared as “for testing only”, “not ready for production use”.

Contribution

We have done some analysis a while back which shows that the main obstacle for production use of current in-memory backend is enormous overhead when storing key values (likely a trade-off vs simplicity of implementation for something that was only intended for testing purposes).
Quite often the total overhead of wrapping data structures and references is bigger than the actual data being stored. 

After a series of improvements this overhead was significantly reduced (2x-5x depending on the “shape” of the data stored and the size of individual data entries). 

The modified version of in-memory backend is successfully used in production, handling up to 70+ concurrent read/write transactions at any one moment, 10+ millions of vertices (of which quite a few have up to 60 properties) and about 3x more edges, within one JVM.

More detailed analysis and memory profiling of current vs improved implementation is attached to the corresponding github issue

The initial PR is here:
 

NOTE: the PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.

However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.

This is because:

a)      the modifications are fairly straightforward and do not change the general structure of existing backend

b)      don’t add too much extra code or any external dependencies

c)       don’t require any additional documentation or setup instructions, at least initially

d)      all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses

e)      the same backend will simply be fit for production use, no new backends to maintain


Any thoughts on the contribution as a whole and new backend vs updating the existing one are much appreciated.
 

Many thanks,

Dmitry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To post to this group, send email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc56d81b-f573-4b91-9471-e1f0b2a80d1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To post to this group, send email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/0de831e3-28d5-42ba-b27c-b3e50af8099f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To post to this group, send email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CAJ6znMhPX4TVrifM8Do4VaSZH%3D3uNWnuZLYQUtp4Q9qYDeyp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To post to this group, send email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/CALuGr6ZQkOvcdcgW%3DUkx71RZyta9M9VTPrzas0OxN7Ra-LiGFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jan...@....
To post to this group, send email to jan...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/2877bbe5-8c5f-4a5b-9aea-e4dbbec16af9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Jan Jansen <faro...@...>
 

Hi

We should extract the in-memory backend into an extra maven project before merging your PR.

Doing this in a single PR would make it hard to review.

I will create an issue for it. Dmitry, if you like to do this before, otherwise I can do this.

Greetings,
Jan


Dmitry Kovalev <dk.g...@...>
 

Hi Jan,

thank you for the reply. If there was a separate decision made to move the existing in-memory backend out of core package, then I agree it makes sense to do it before applying any changes. 

Unfortunately I am not in a position right now to do this move myself, but I will be happy to rebase my PR accordingly once it is done. Then I will proceed with addressing all the comments in there.

Thanks a lot,
Dmitry




On Mon, 30 Sep 2019 at 09:53, 'Jan Jansen' via JanusGraph developers <janusgr...@...> wrote:
Hi

We should extract the in-memory backend into an extra maven project before merging your PR.

Doing this in a single PR would make it hard to review.

I will create an issue for it. Dmitry, if you like to do this before, otherwise I can do this.

Greetings,
Jan

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/23404d0c-759d-46ab-9df4-72fc49bc2d34%40googlegroups.com.