Date   

JanusGraph project at The Linux Foundation

Jason Plurad <plu...@...>
 

Today, The Linux Foundation announced the creation of the JanusGraph project. We're excited to push forward the open source, collaborative effort on scalable graph databases that was initiated by the Aurelius team with Titan. JanusGraph will continue to be a native Apache TinkerPop implementation. The first effort definitely includes finally upgrading beyond 3.0.1-incubating :)

There are janusgraph-users and janusgraph-dev Google Groups created for public mailing list collaboration, but as is the case with other providers in the TinkerPop ecosystem, I'd expect cross-traffic to continue with the TinkerPop lists.

If you will be in Austin this weekend for Graph Day Texas, there is a great lineup of graph-related talks. Ted Wilmes and I from Apache TinkerPop will be there, and so will others involved in getting JanusGraph established. After the Graph Day happy hour, we will gather up for an informal meetup/birds-of-a-feather with anybody interested.


Have a good one,
Jason


Low-hanging fruit for JanusGraph

Austin Sharp <austins...@...>
 

Hi all,

extremely excited that JanusGraph is out into the world! I have been working with Titan since 0.4.x and have been hoping for a long time to see new maintainers so that bug reports, pull requests, etc don't go unheeded.

There are a few long-standing Titan issues that would be easy for JanusGraph to pick up and run with, to immediately differentiate from the existing Titan codebase and give people like myself the excuse we're looking for to migrate over! I suspect others have their own wishlists, so I'd encourage everyone to chime in.

A few of my personal ones:

1. Update to newer versions of dependencies (Guava 21, Cassandra 3, ElasticSearch, etc).
2. Provide a well documented migration path from Titan 1.0
3. Keep up to date with Tinkerpop (I know this has been a stated goal elsewhere)
4. Handle supernodes better, for instance by streaming - i.e., when traversing over edges, don't pull all edges in from Cassandra at once. This is my personal bugbear - we keep having to change our schema and use indices or properties when edges are by far the best fit for the model, because Titan can easily blow through the Cassandra frame size even if you set up vertex partitioning to split adjacency lists, among other issues.

Excited to see where things go, and hopefully I can switch to using JanusGraph ASAP!


Re: Low-hanging fruit for JanusGraph

Adam Phelps <a...@...>
 

On 1/19/17 1:06 PM, Austin Sharp wrote:
4. Handle supernodes better, for instance by streaming - i.e., when
traversing over edges, don't pull all edges in from Cassandra at once.
This is my personal bugbear - we keep having to change our schema and
use indices or properties when edges are by far the best fit for the
model, because Titan can easily blow through the Cassandra frame size
even if you set up vertex partitioning to split adjacency lists, among
other issues.
This is a big one for us as well, and while we have stuck with a schema that allows supernodes we've made all sorts of work arounds in our java code which accesses Titan.

Although in our case we're dealing with HBase underneath, and so the limits are somewhat different. However similar solutions should be applicable to any backend, either by changing the row structure or by having the clients page through the results with multiple calls.

Related to this area, I think the "hands-off-the-backend" approach that the Titan project took should be ditched. For systems like this installing custom HBase filters, co-processors, etc can be hugely beneficial in terms of performance. From talking to the Datastax folks about their new graph product it sounds like they've done a lot to integrate the graph DB with the Casandra nodes themselves, and I think a similar approach will be needed to move JanusGraph forward.

- Adam


Re: Low-hanging fruit for JanusGraph

HadoopMarc <m.c.d...@...>
 

I was also delighted to read the notices about JanusGraph!

High in my wishlist is:  better support for installing JanusGraph in any of the major Hadoop distributions. Many users have had to face version conflicts with the commonly used dependencies of hadoop, spark, hbase, etc. I am not sure how to handle this. For a start we should make an inventory of version problems once the first JanusGraph release (candidate) is out.

Cheers,     Marc


Re: Low-hanging fruit for JanusGraph

plu...@...
 

> 2. Provide a well documented migration path from Titan 1.0

Have you already migrated your Titan 0.4 code over to Titan 1.0?

-- Jason


On Thursday, January 19, 2017 at 4:10:13 PM UTC-5, Austin Sharp wrote:
Hi all,

extremely excited that JanusGraph is out into the world! I have been working with Titan since 0.4.x and have been hoping for a long time to see new maintainers so that bug reports, pull requests, etc don't go unheeded.

There are a few long-standing Titan issues that would be easy for JanusGraph to pick up and run with, to immediately differentiate from the existing Titan codebase and give people like myself the excuse we're looking for to migrate over! I suspect others have their own wishlists, so I'd encourage everyone to chime in.

A few of my personal ones:

1. Update to newer versions of dependencies (Guava 21, Cassandra 3, ElasticSearch, etc).
2. Provide a well documented migration path from Titan 1.0
3. Keep up to date with Tinkerpop (I know this has been a stated goal elsewhere)
4. Handle supernodes better, for instance by streaming - i.e., when traversing over edges, don't pull all edges in from Cassandra at once. This is my personal bugbear - we keep having to change our schema and use indices or properties when edges are by far the best fit for the model, because Titan can easily blow through the Cassandra frame size even if you set up vertex partitioning to split adjacency lists, among other issues.

Excited to see where things go, and hopefully I can switch to using JanusGraph ASAP!


Re: Low-hanging fruit for JanusGraph

plu...@...
 

> Related to this area, I think the "hands-off-the-backend" approach that the Titan project took should be ditched.

I've heard this sentiment before, but my guess is that it will be a while before it gets entertained. That being said, I think it's an interesting topic that would be better discussed over on janusgraph-dev.

-- Jason


On Thursday, January 19, 2017 at 8:59:24 PM UTC-5, Adam Phelps wrote:
On 1/19/17 1:06 PM, Austin Sharp wrote:
> 4. Handle supernodes better, for instance by streaming - i.e., when
> traversing over edges, don't pull all edges in from Cassandra at once.
> This is my personal bugbear - we keep having to change our schema and
> use indices or properties when edges are by far the best fit for the
> model, because Titan can easily blow through the Cassandra frame size
> even if you set up vertex partitioning to split adjacency lists, among
> other issues.

This is a big one for us as well, and while we have stuck with a schema
that allows supernodes we've made all sorts of work arounds in our java
code which accesses Titan.

Although in our case we're dealing with HBase underneath, and so the
limits are somewhat different.  However similar solutions should be
applicable to any backend, either by changing the row structure or by
having the clients page through the results with multiple calls.

Related to this area, I think the "hands-off-the-backend" approach that
the Titan project took should be ditched.  For systems like this
installing custom HBase filters, co-processors, etc can be hugely
beneficial in terms of performance.  From talking to the Datastax folks
about their new graph product it sounds like they've done a lot to
integrate the graph DB with the Casandra nodes themselves, and I think a
similar approach will be needed to move JanusGraph forward.

- Adam


Re: Low-hanging fruit for JanusGraph

Austin Sharp <austins...@...>
 

Yes, I've been on Titan 1.0 for quite some time now. I meant from Titan 1.0 to Titan 1.1 / JanusGraph. Maybe very little is needed - I haven't looked into the unreleased Titan 1.1 changes much.


On Friday, January 20, 2017 at 10:33:29 AM UTC-8, Jason Plurad wrote:
> 2. Provide a well documented migration path from Titan 1.0

Have you already migrated your Titan 0.4 code over to Titan 1.0?

-- Jason

On Thursday, January 19, 2017 at 4:10:13 PM UTC-5, Austin Sharp wrote:
Hi all,

extremely excited that JanusGraph is out into the world! I have been working with Titan since 0.4.x and have been hoping for a long time to see new maintainers so that bug reports, pull requests, etc don't go unheeded.

There are a few long-standing Titan issues that would be easy for JanusGraph to pick up and run with, to immediately differentiate from the existing Titan codebase and give people like myself the excuse we're looking for to migrate over! I suspect others have their own wishlists, so I'd encourage everyone to chime in.

A few of my personal ones:

1. Update to newer versions of dependencies (Guava 21, Cassandra 3, ElasticSearch, etc).
2. Provide a well documented migration path from Titan 1.0
3. Keep up to date with Tinkerpop (I know this has been a stated goal elsewhere)
4. Handle supernodes better, for instance by streaming - i.e., when traversing over edges, don't pull all edges in from Cassandra at once. This is my personal bugbear - we keep having to change our schema and use indices or properties when edges are by far the best fit for the model, because Titan can easily blow through the Cassandra frame size even if you set up vertex partitioning to split adjacency lists, among other issues.

Excited to see where things go, and hopefully I can switch to using JanusGraph ASAP!


JanusGraph seems to insist on Elastic Search

rohit.j...@...
 

Since I did  want to use an index since performance is not of essence at the current time and I did not want to use either Elastic Search or SOLR, I used the following in Gremlin:

graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')


However, that gives me this error:

“Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

So I figured I would just use SOLR already installed on the CDH deployment I was running on.  Then I get this error:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch).  Use the ManagementSystem interface instead of the local configuration to control this setting.

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.solr.mode=http (Type: GLOBAL_OFFLINE) is overridden by globally managed value (cloud).  Use the ManagementSystem interface instead of the local configuration to control this setting.

Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex


It seemed that Titan had a ManagementSystem but perhaps JanusGraph doesn’t?  Anyway, how do I not use any index or use SOLR instead of Elastic Search?


Rohit


Re: JanusGraph seems to insist on Elastic Search

Jason Plurad <plu...@...>
 

Try dropping the existing HBase table first (default HBase table name is janusgraph), and then create a new graph with graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties') which does not use any index provider.

Most likely what happened was you used the janusgraph-hbase-es.properties first, which failed to connect to Elasticsearch, but the ES configuration properties were stored in the table itself.

-- Jason

On Monday, January 23, 2017 at 12:00:14 PM UTC-5, Rohit Jain wrote:

Since I did  want to use an index since performance is not of essence at the current time and I did not want to use either Elastic Search or SOLR, I used the following in Gremlin:

graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')


However, that gives me this error:

“Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

So I figured I would just use SOLR already installed on the CDH deployment I was running on.  Then I get this error:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch).  Use the ManagementSystem interface instead of the local configuration to control this setting.

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.solr.mode=http (Type: GLOBAL_OFFLINE) is overridden by globally managed value (cloud).  Use the ManagementSystem interface instead of the local configuration to control this setting.

Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex


It seemed that Titan had a ManagementSystem but perhaps JanusGraph doesn’t?  Anyway, how do I not use any index or use SOLR instead of Elastic Search?


Rohit


Re: JanusGraph seems to insist on Elastic Search

Rohit Jain <rohit.j...@...>
 

That was it Jason!  Thanks!  Could use a better error :-)


On Monday, January 23, 2017 at 11:17:19 AM UTC-6, Jason Plurad wrote:
Try dropping the existing HBase table first (default HBase table name is janusgraph), and then create a new graph with graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties') which does not use any index provider.

Most likely what happened was you used the janusgraph-hbase-es.properties first, which failed to connect to Elasticsearch, but the ES configuration properties were stored in the table itself.

-- Jason

On Monday, January 23, 2017 at 12:00:14 PM UTC-5, Rohit Jain wrote:

Since I did  want to use an index since performance is not of essence at the current time and I did not want to use either Elastic Search or SOLR, I used the following in Gremlin:

graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')


However, that gives me this error:

“Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

So I figured I would just use SOLR already installed on the CDH deployment I was running on.  Then I get this error:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch).  Use the ManagementSystem interface instead of the local configuration to control this setting.

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.solr.mode=http (Type: GLOBAL_OFFLINE) is overridden by globally managed value (cloud).  Use the ManagementSystem interface instead of the local configuration to control this setting.

Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex


It seemed that Titan had a ManagementSystem but perhaps JanusGraph doesn’t?  Anyway, how do I not use any index or use SOLR instead of Elastic Search?


Rohit


Re: JanusGraph seems to insist on Elastic Search

Rohit Jain <rohit.j...@...>
 

However, when I do the next step in the tutorial I get the following error:
gremlin> GraphOfTheGodsFactory.load(graph)
Unknown external index backend: search
Display stack trace? [yN] N

On Monday, January 23, 2017 at 3:23:05 PM UTC-6, Rohit Jain wrote:
That was it Jason!  Thanks!  Could use a better error :-)

On Monday, January 23, 2017 at 11:17:19 AM UTC-6, Jason Plurad wrote:
Try dropping the existing HBase table first (default HBase table name is janusgraph), and then create a new graph with graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties') which does not use any index provider.

Most likely what happened was you used the janusgraph-hbase-es.properties first, which failed to connect to Elasticsearch, but the ES configuration properties were stored in the table itself.

-- Jason

On Monday, January 23, 2017 at 12:00:14 PM UTC-5, Rohit Jain wrote:

Since I did  want to use an index since performance is not of essence at the current time and I did not want to use either Elastic Search or SOLR, I used the following in Gremlin:

graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')


However, that gives me this error:

“Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

So I figured I would just use SOLR already installed on the CDH deployment I was running on.  Then I get this error:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch).  Use the ManagementSystem interface instead of the local configuration to control this setting.

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.solr.mode=http (Type: GLOBAL_OFFLINE) is overridden by globally managed value (cloud).  Use the ManagementSystem interface instead of the local configuration to control this setting.

Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex


It seemed that Titan had a ManagementSystem but perhaps JanusGraph doesn’t?  Anyway, how do I not use any index or use SOLR instead of Elastic Search?


Rohit


Re: JanusGraph seems to insist on Elastic Search

Jason Plurad <plu...@...>
 

If you are trying to do the tutorial without an index backend, you need to load the graph using this command instead:

GraphOfTheGodsFactory.load(graph, null, true)

https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/example/GraphOfTheGodsFactory.java#L66

-- Jason


On Monday, January 23, 2017 at 4:27:03 PM UTC-5, Rohit Jain wrote:
However, when I do the next step in the tutorial I get the following error:
gremlin> GraphOfTheGodsFactory.load(graph)
Unknown external index backend: search
Display stack trace? [yN] N

On Monday, January 23, 2017 at 3:23:05 PM UTC-6, Rohit Jain wrote:
That was it Jason!  Thanks!  Could use a better error :-)

On Monday, January 23, 2017 at 11:17:19 AM UTC-6, Jason Plurad wrote:
Try dropping the existing HBase table first (default HBase table name is janusgraph), and then create a new graph with graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties') which does not use any index provider.

Most likely what happened was you used the janusgraph-hbase-es.properties first, which failed to connect to Elasticsearch, but the ES configuration properties were stored in the table itself.

-- Jason

On Monday, January 23, 2017 at 12:00:14 PM UTC-5, Rohit Jain wrote:

Since I did  want to use an index since performance is not of essence at the current time and I did not want to use either Elastic Search or SOLR, I used the following in Gremlin:

graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')


However, that gives me this error:

“Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

So I figured I would just use SOLR already installed on the CDH deployment I was running on.  Then I get this error:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch).  Use the ManagementSystem interface instead of the local configuration to control this setting.

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.solr.mode=http (Type: GLOBAL_OFFLINE) is overridden by globally managed value (cloud).  Use the ManagementSystem interface instead of the local configuration to control this setting.

Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex


It seemed that Titan had a ManagementSystem but perhaps JanusGraph doesn’t?  Anyway, how do I not use any index or use SOLR instead of Elastic Search?


Rohit


Re: JanusGraph seems to insist on Elastic Search

Rohit Jain <rohit.j...@...>
 

Thanks Jason!  That worked.


On Monday, January 23, 2017 at 4:16:53 PM UTC-6, Jason Plurad wrote:
If you are trying to do the tutorial without an index backend, you need to load the graph using this command instead:


-- Jason

On Monday, January 23, 2017 at 4:27:03 PM UTC-5, Rohit Jain wrote:
However, when I do the next step in the tutorial I get the following error:
gremlin> GraphOfTheGodsFactory.load(graph)
Unknown external index backend: search
Display stack trace? [yN] N

On Monday, January 23, 2017 at 3:23:05 PM UTC-6, Rohit Jain wrote:
That was it Jason!  Thanks!  Could use a better error :-)

On Monday, January 23, 2017 at 11:17:19 AM UTC-6, Jason Plurad wrote:
Try dropping the existing HBase table first (default HBase table name is janusgraph), and then create a new graph with graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties') which does not use any index provider.

Most likely what happened was you used the janusgraph-hbase-es.properties first, which failed to connect to Elasticsearch, but the ES configuration properties were stored in the table itself.

-- Jason

On Monday, January 23, 2017 at 12:00:14 PM UTC-5, Rohit Jain wrote:

Since I did  want to use an index since performance is not of essence at the current time and I did not want to use either Elastic Search or SOLR, I used the following in Gremlin:

graph = JanusGraphFactory.open('conf/janusgraph-hbase.properties')


However, that gives me this error:

“Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex

So I figured I would just use SOLR already installed on the CDH deployment I was running on.  Then I get this error:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch).  Use the ManagementSystem interface instead of the local configuration to control this setting.

23:33:13 WARN  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Local setting index.search.solr.mode=http (Type: GLOBAL_OFFLINE) is overridden by globally managed value (cloud).  Use the ManagementSystem interface instead of the local configuration to control this setting.

Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex


It seemed that Titan had a ManagementSystem but perhaps JanusGraph doesn’t?  Anyway, how do I not use any index or use SOLR instead of Elastic Search?


Rohit


Query Execution

karthi...@...
 

Hi,

I was wondering if the execution of a single query/traversal is on a single node or is the execution distributed across nodes ? I understand that the storage is distributed across many machines with HBase/Cassandra, but how is the execution of the query handled ?

I am new here, so please excuse my lack of knowledge. 

Thanks.

- Karthik


Maven coordinates for Janus graph

akris...@...
 

What are the maven co-ordinates for downloading the Janus graph jars? is there one?

Regards
Ajay krishnan


Re: Query Execution

HadoopMarc <m.c.d...@...>
 

Hi Karthik,

The query execution is carried out by the JanusGraph instance, see http://docs.janusgraph.org/0.1.0-SNAPSHOT/arch-overview.html

The JanusGraph instance can be your own application with JanusGraph dependencies (embedded operation), the gremlin console or the JanusGraph/Gremlin server. The JanusGraph instance on its turn depends on TinkerPop.

Cheers,    Marc

Op woensdag 25 januari 2017 15:55:57 UTC+1 schreef karthik tunga:

Hi,

I was wondering if the execution of a single query/traversal is on a single node or is the execution distributed across nodes ? I understand that the storage is distributed across many machines with HBase/Cassandra, but how is the execution of the query handled ?

I am new here, so please excuse my lack of knowledge. 

Thanks.

- Karthik


Re: Maven coordinates for Janus graph

HadoopMarc <m.c.d...@...>
 

Hi Ajay,

I believe there is just the git repo just now:  https://github.com/JanusGraph/janusgraph

You can clone the repo and do a local build, which will put the jars in your local .m2 repo, see the pom.xml for the maven coordinates.  If you want installable zip archives for console and server, I think (reading from the pom.xml) you can still use the release profile and get the archives in the target directories of console and server (did not try it myself), so:

mvn clean install -DskipTests -Dgpg.skip -Pjanusgraph-release

Cheers,    Marc

Op woensdag 25 januari 2017 19:34:51 UTC+1 schreef ak...@...:

What are the maven co-ordinates for downloading the Janus graph jars? is there one?

Regards
Ajay krishnan


partition methods on JanusGraph

Demai <nid...@...>
 

hi, folks,

first, very glad to see someone picking up Titan.... 

I am new in this area and looking for a couple pointers. Does JanusGraph(titan) reply on the datastore(such as HBase) for data partition? HBase is more like a range-partition on the key, which probably not the best way to distribute graph data cross different servers... Hence, the question. 

thanks in advance.

Demai


Re: partition methods on JanusGraph

Jerry He <jerr...@...>
 

Hi, Demai

JanusGraph/Titan takes advantage of the partition/shard capability of the storage backend.

This is a high-level description of the graph partition. 
http://docs.janusgraph.org/0.1.0-SNAPSHOT/graph-partitioning.html

This is the backend storage data model.
http://docs.janusgraph.org/0.1.0-SNAPSHOT/data-model.html

Jerry


Re: partition methods on JanusGraph

Demai <nid...@...>
 

Jerry, thanks a lot. appreciate the pointers.... Demai


On Friday, January 27, 2017 at 9:54:23 AM UTC-8, Jerry He wrote:
Hi, Demai

JanusGraph/Titan takes advantage of the partition/shard capability of the storage backend.

This is a high-level description of the graph partition. 

This is the backend storage data model.

Jerry

1 - 20 of 6656