Low-hanging fruit for JanusGraph


Austin Sharp <austins...@...>
 

Yes, I've been on Titan 1.0 for quite some time now. I meant from Titan 1.0 to Titan 1.1 / JanusGraph. Maybe very little is needed - I haven't looked into the unreleased Titan 1.1 changes much.


On Friday, January 20, 2017 at 10:33:29 AM UTC-8, Jason Plurad wrote:
> 2. Provide a well documented migration path from Titan 1.0

Have you already migrated your Titan 0.4 code over to Titan 1.0?

-- Jason

On Thursday, January 19, 2017 at 4:10:13 PM UTC-5, Austin Sharp wrote:
Hi all,

extremely excited that JanusGraph is out into the world! I have been working with Titan since 0.4.x and have been hoping for a long time to see new maintainers so that bug reports, pull requests, etc don't go unheeded.

There are a few long-standing Titan issues that would be easy for JanusGraph to pick up and run with, to immediately differentiate from the existing Titan codebase and give people like myself the excuse we're looking for to migrate over! I suspect others have their own wishlists, so I'd encourage everyone to chime in.

A few of my personal ones:

1. Update to newer versions of dependencies (Guava 21, Cassandra 3, ElasticSearch, etc).
2. Provide a well documented migration path from Titan 1.0
3. Keep up to date with Tinkerpop (I know this has been a stated goal elsewhere)
4. Handle supernodes better, for instance by streaming - i.e., when traversing over edges, don't pull all edges in from Cassandra at once. This is my personal bugbear - we keep having to change our schema and use indices or properties when edges are by far the best fit for the model, because Titan can easily blow through the Cassandra frame size even if you set up vertex partitioning to split adjacency lists, among other issues.

Excited to see where things go, and hopefully I can switch to using JanusGraph ASAP!


plu...@...
 

> Related to this area, I think the "hands-off-the-backend" approach that the Titan project took should be ditched.

I've heard this sentiment before, but my guess is that it will be a while before it gets entertained. That being said, I think it's an interesting topic that would be better discussed over on janusgraph-dev.

-- Jason


On Thursday, January 19, 2017 at 8:59:24 PM UTC-5, Adam Phelps wrote:
On 1/19/17 1:06 PM, Austin Sharp wrote:
> 4. Handle supernodes better, for instance by streaming - i.e., when
> traversing over edges, don't pull all edges in from Cassandra at once.
> This is my personal bugbear - we keep having to change our schema and
> use indices or properties when edges are by far the best fit for the
> model, because Titan can easily blow through the Cassandra frame size
> even if you set up vertex partitioning to split adjacency lists, among
> other issues.

This is a big one for us as well, and while we have stuck with a schema
that allows supernodes we've made all sorts of work arounds in our java
code which accesses Titan.

Although in our case we're dealing with HBase underneath, and so the
limits are somewhat different.  However similar solutions should be
applicable to any backend, either by changing the row structure or by
having the clients page through the results with multiple calls.

Related to this area, I think the "hands-off-the-backend" approach that
the Titan project took should be ditched.  For systems like this
installing custom HBase filters, co-processors, etc can be hugely
beneficial in terms of performance.  From talking to the Datastax folks
about their new graph product it sounds like they've done a lot to
integrate the graph DB with the Casandra nodes themselves, and I think a
similar approach will be needed to move JanusGraph forward.

- Adam


plu...@...
 

> 2. Provide a well documented migration path from Titan 1.0

Have you already migrated your Titan 0.4 code over to Titan 1.0?

-- Jason


On Thursday, January 19, 2017 at 4:10:13 PM UTC-5, Austin Sharp wrote:
Hi all,

extremely excited that JanusGraph is out into the world! I have been working with Titan since 0.4.x and have been hoping for a long time to see new maintainers so that bug reports, pull requests, etc don't go unheeded.

There are a few long-standing Titan issues that would be easy for JanusGraph to pick up and run with, to immediately differentiate from the existing Titan codebase and give people like myself the excuse we're looking for to migrate over! I suspect others have their own wishlists, so I'd encourage everyone to chime in.

A few of my personal ones:

1. Update to newer versions of dependencies (Guava 21, Cassandra 3, ElasticSearch, etc).
2. Provide a well documented migration path from Titan 1.0
3. Keep up to date with Tinkerpop (I know this has been a stated goal elsewhere)
4. Handle supernodes better, for instance by streaming - i.e., when traversing over edges, don't pull all edges in from Cassandra at once. This is my personal bugbear - we keep having to change our schema and use indices or properties when edges are by far the best fit for the model, because Titan can easily blow through the Cassandra frame size even if you set up vertex partitioning to split adjacency lists, among other issues.

Excited to see where things go, and hopefully I can switch to using JanusGraph ASAP!


HadoopMarc <m.c.d...@...>
 

I was also delighted to read the notices about JanusGraph!

High in my wishlist is:  better support for installing JanusGraph in any of the major Hadoop distributions. Many users have had to face version conflicts with the commonly used dependencies of hadoop, spark, hbase, etc. I am not sure how to handle this. For a start we should make an inventory of version problems once the first JanusGraph release (candidate) is out.

Cheers,     Marc


Adam Phelps <a...@...>
 

On 1/19/17 1:06 PM, Austin Sharp wrote:
4. Handle supernodes better, for instance by streaming - i.e., when
traversing over edges, don't pull all edges in from Cassandra at once.
This is my personal bugbear - we keep having to change our schema and
use indices or properties when edges are by far the best fit for the
model, because Titan can easily blow through the Cassandra frame size
even if you set up vertex partitioning to split adjacency lists, among
other issues.
This is a big one for us as well, and while we have stuck with a schema that allows supernodes we've made all sorts of work arounds in our java code which accesses Titan.

Although in our case we're dealing with HBase underneath, and so the limits are somewhat different. However similar solutions should be applicable to any backend, either by changing the row structure or by having the clients page through the results with multiple calls.

Related to this area, I think the "hands-off-the-backend" approach that the Titan project took should be ditched. For systems like this installing custom HBase filters, co-processors, etc can be hugely beneficial in terms of performance. From talking to the Datastax folks about their new graph product it sounds like they've done a lot to integrate the graph DB with the Casandra nodes themselves, and I think a similar approach will be needed to move JanusGraph forward.

- Adam


Austin Sharp <austins...@...>
 

Hi all,

extremely excited that JanusGraph is out into the world! I have been working with Titan since 0.4.x and have been hoping for a long time to see new maintainers so that bug reports, pull requests, etc don't go unheeded.

There are a few long-standing Titan issues that would be easy for JanusGraph to pick up and run with, to immediately differentiate from the existing Titan codebase and give people like myself the excuse we're looking for to migrate over! I suspect others have their own wishlists, so I'd encourage everyone to chime in.

A few of my personal ones:

1. Update to newer versions of dependencies (Guava 21, Cassandra 3, ElasticSearch, etc).
2. Provide a well documented migration path from Titan 1.0
3. Keep up to date with Tinkerpop (I know this has been a stated goal elsewhere)
4. Handle supernodes better, for instance by streaming - i.e., when traversing over edges, don't pull all edges in from Cassandra at once. This is my personal bugbear - we keep having to change our schema and use indices or properties when edges are by far the best fit for the model, because Titan can easily blow through the Cassandra frame size even if you set up vertex partitioning to split adjacency lists, among other issues.

Excited to see where things go, and hopefully I can switch to using JanusGraph ASAP!