Date   

[PROPOSAL] Default query.fast-property to TRUE

Ted Wilmes <twi...@...>
 

Currently, query.fast-property is defaulted to false which means that vertex property retrievals require repeated trips to the storage backend.  Issue 104 proposes that this be changed and defaulted back to true.  This would mean that the first retrieval of a vertex property would also retrieve all of the vertex's other properties.  This may be prohibitively expensive in cases where there are a large number of properties on vertices but I believe that in the majority of cases, the reduction in storage back-and-forth will outweigh the extra property payload size.

Having said that, we'd like further feedback if anyone thinks that changing this default is not a good idea.

I'll leave this proposal open for 72 hours and assume lazy consensus and move forward if no objections are lodged.

Thanks,
Ted


Re: HBase table definition and how flexible to change it?

Jerry He <jerr...@...>
 

https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-hbase-parent/janusgraph-hbase-core/src/main/java/org/janusgraph/diskstorage/hbase/HBaseStoreManager.java#L246

This tells you the column family mappings. 

Jerry


Re: HBase table definition and how flexible to change it?

Demai Ni <nid...@...>
 

Jerry,

thanks a lot. that is exactly what I was looking for.

Demai

On Mon, Feb 13, 2017 at 3:12 PM, Jerry He <jerr...@...> wrote:

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [PROPOSAL] Replace FulgoraGraphComputer by IgniteGraphActors/IgniteGraphComputer

dzso...@...
 

Hey, we already have a tinkerpop-ignite implementation as part of an in-house project. We'd be happy to merge it with Janus, if we can work together.

You are right about the convergence of OLTP and OLAP via Ignite. That's why we chose it in the first place. We avoided using IgniteRDD and based everything on IgniteCache, for this reason as well. When you have TP and AP tasks running on the same data, you need a series of new designs and mechanisms to manage consistency.

For the distributed TP implementation or actually, asynchronous parallelism, we may not follow the actor model. Consider this: http://vertx.io/. So it will involve changes to Janus and TinkerPop.

Hence, this merger won't be straightforward. Merging software never is...

I'm sure we can work it out the details. But that requires we all know and agree on where we are heading. This approach leads to an alternative data system, with a unique set of features and tradeoffs. We think of it as a data system designed for data science, enabling data-centric computing, end-to-end.

And that's why we didn't fork Titan, because it contained design decisions that don't quite align with the goal of OLTP-OLAP convergence.

Anyway, let's talk :)

song


On Friday, January 27, 2017 at 5:34:48 AM UTC+8, Dylan Bethune-Waddell wrote:
So I've been toying with this idea for a while, but I want to make it concrete here so we can shoot it down, mutate, or build it up as needed:

I think that we should on-board an additional backend for distributed, asynchronous OLAP/OLTP "hybrid" processing, which Marko Rodriguez is implementing a reference implementation for right now in TinkerPop (link to branch and JIRA). Here's a link to a figure he attached to the JIRA issue. We need to drop or change the name of FulgoraGraphComputer anyways, so my suggestion is "repeal and replace" - let's take a stab at providing an implementation of Marko's new work, which he is doing with Akka, whereby we will use Apache Ignite as a backend instead with the goal of addressing a concern raised/discussed in the JIRA issue:
  • If transactions are worked out, then distributed OLTP Gremlin provides mutation capabilities (something currently not implemented for GraphComputer). That is addVaddEdrop, etc. just works. *Caveate, transactions in this environment across GremlinServer seems difficult.*

Indeed, which is why I propose we try out Apache Ignite as a way to achieve this functionality while replacing our in-memory, single-machine graph computer (Fulgora) with a distributed in-memory graph computer that will suit most OLAP-style analytics needs with the benefit of full OLTP TinkerPop API functionality and integration points with Spark and tools that work with data stored in HDFS. Quoting from this article, which admittedly reads as a long-form advertisement for Apache Ignite:
  • Ignite can provide shared storage, so state can be passed from one Spark application or job to another
  • Ignite can provide SQL with indexing so Spark SQL can be accelerated over 1,000x
  • When working with files instead of RDDs, the Apache Ignite In-Memory File System (IGFS) can also share state between Spark jobs and applications
My understanding is that Apache Ignite essentially provides a distributed In-Memory cache that is compatible with and a "drop-in" value add to both HDFS and Apache Spark with an integration point for Cassandra as well. To me the RDD-like data structure of Ignite which maintains state complete with ACID transactions via an SQL interface would therefore address Marko's concern about distributed transactions. Here are a few jumping-off points we could learn from:

1. SQLG - MIT Licensed by Pieter Martin, who is quite active on the TinkerPop mailing lists and JIRA. We could ask nicely if he would like to lend a hand, as I am looking at the SQL interface as the most sensible target here.
2. Marko Rodriguez's reference implementation of GraphActors with Akka - as mentioned above this is an active body of work that has not yet been released or merged into TinkerPop master, but is what I suggest we target here.
3. Ted Wilmes' SQL-gremlin - while this goes "the other way" and maps a TinkerPop-enabled database to a tabular representation so that you can run SQL queries over the data, I'm guessing we'll see plenty of overlapping gotchas that Ted already ran into.
4. SparkGraphComputer - or "the thing that already works". Apache Ignite shadowing the Spark/Hadoop APIs might put a drop-in IgniteGraphComputer within reach which would give us an idea of how performant and useful we could expect the system to be overall before we invest in the "big change" of IgniteGraphActors or whatever middle-ground between the GraphActors, GraphComputer, and GraphProvider frameworks we'll need to find to realise an Apache Ignite backend within JanusGraph.

I also wanted to mention my thoughts on IGFS (Ignite File System) which runs either on top of HDFS, between HDFS and Spark, or standalone (I believe). My thinking is that we can store side-effect data structures in IGFS and it will enable the same ACID-ness on distributed side-effect data structures we would be getting for elements in the Graph/RDD data structure via IgniteRDD or what have you. From there, persistence to HDFS via a BulkDumperVertexProgram or BulkExportVertexProgram would be possible, as would running spark job chains with addE()/V() and drop() on the IgniteRDD or transformations thereof, opening up a path to ETL type workflows involving other "Big Data" creatures/tools. Further, we could then persist back into JanusGraph with a BulkLoaderVertexProgram implementation. So again, this is somewhat of a GraphComputer/GraphActors hybrid, but I'm not sure I mind. Jonathan Ellithorpe mentioned his implementation of TinkerPop over RAMcloud a while back on the TinkerPop mailing list as part of a benchmarking effort - we could ask him about how performant that was as it sounds similar to what this would be. Benchmarks would be nice, too :)

I'm interested in what people think of on-boarding this kind of processing engine in principle, even if all the optimistic assumptions of the feasibility of Ignite I have made here turn out to be unfounded. Are there other options we should consider besides Ignite, or should we stick closer to home and simply implement the GraphActors/Partitioner/Partitions standard Marko is working on directly with Cassandra/HBase as a giant refactor over time? Clearly, this is a change we can move up or down our development schedule and spend a while getting right, but if performant I see a lot of value here.


Re: [PROPOSAL] Replace FulgoraGraphComputer by IgniteGraphActors/IgniteGraphComputer

Demai Ni <nid...@...>
 

Song,

great that you guys already have the implement with ignite. Would you please share some performance numbers, for example comparing to Titan-on-HBase, how much Ignite improve?  My team consider Janus+HBase at this moment, but open to better solutions. Thanks

Demai

 

On Wed, Feb 15, 2017 at 1:55 AM, <dzso...@...> wrote:
Hey, we already have a tinkerpop-ignite implementation as part of an in-house project. We'd be happy to merge it with Janus, if we can work together.

You are right about the convergence of OLTP and OLAP via Ignite. That's why we chose it in the first place. We avoided using IgniteRDD and based everything on IgniteCache, for this reason as well. When you have TP and AP tasks running on the same data, you need a series of new designs and mechanisms to manage consistency.

For the distributed TP implementation or actually, asynchronous parallelism, we may not follow the actor model. Consider this: http://vertx.io/. So it will involve changes to Janus and TinkerPop.

Hence, this merger won't be straightforward. Merging software never is...

I'm sure we can work it out the details. But that requires we all know and agree on where we are heading. This approach leads to an alternative data system, with a unique set of features and tradeoffs. We think of it as a data system designed for data science, enabling data-centric computing, end-to-end.

And that's why we didn't fork Titan, because it contained design decisions that don't quite align with the goal of OLTP-OLAP convergence.

Anyway, let's talk :)

song

On Friday, January 27, 2017 at 5:34:48 AM UTC+8, Dylan Bethune-Waddell wrote:
So I've been toying with this idea for a while, but I want to make it concrete here so we can shoot it down, mutate, or build it up as needed:

I think that we should on-board an additional backend for distributed, asynchronous OLAP/OLTP "hybrid" processing, which Marko Rodriguez is implementing a reference implementation for right now in TinkerPop (link to branch and JIRA). Here's a link to a figure he attached to the JIRA issue. We need to drop or change the name of FulgoraGraphComputer anyways, so my suggestion is "repeal and replace" - let's take a stab at providing an implementation of Marko's new work, which he is doing with Akka, whereby we will use Apache Ignite as a backend instead with the goal of addressing a concern raised/discussed in the JIRA issue:
  • If transactions are worked out, then distributed OLTP Gremlin provides mutation capabilities (something currently not implemented for GraphComputer). That is addVaddEdrop, etc. just works. *Caveate, transactions in this environment across GremlinServer seems difficult.*

Indeed, which is why I propose we try out Apache Ignite as a way to achieve this functionality while replacing our in-memory, single-machine graph computer (Fulgora) with a distributed in-memory graph computer that will suit most OLAP-style analytics needs with the benefit of full OLTP TinkerPop API functionality and integration points with Spark and tools that work with data stored in HDFS. Quoting from this article, which admittedly reads as a long-form advertisement for Apache Ignite:
  • Ignite can provide shared storage, so state can be passed from one Spark application or job to another
  • Ignite can provide SQL with indexing so Spark SQL can be accelerated over 1,000x
  • When working with files instead of RDDs, the Apache Ignite In-Memory File System (IGFS) can also share state between Spark jobs and applications
My understanding is that Apache Ignite essentially provides a distributed In-Memory cache that is compatible with and a "drop-in" value add to both HDFS and Apache Spark with an integration point for Cassandra as well. To me the RDD-like data structure of Ignite which maintains state complete with ACID transactions via an SQL interface would therefore address Marko's concern about distributed transactions. Here are a few jumping-off points we could learn from:

1. SQLG - MIT Licensed by Pieter Martin, who is quite active on the TinkerPop mailing lists and JIRA. We could ask nicely if he would like to lend a hand, as I am looking at the SQL interface as the most sensible target here.
2. Marko Rodriguez's reference implementation of GraphActors with Akka - as mentioned above this is an active body of work that has not yet been released or merged into TinkerPop master, but is what I suggest we target here.
3. Ted Wilmes' SQL-gremlin - while this goes "the other way" and maps a TinkerPop-enabled database to a tabular representation so that you can run SQL queries over the data, I'm guessing we'll see plenty of overlapping gotchas that Ted already ran into.
4. SparkGraphComputer - or "the thing that already works". Apache Ignite shadowing the Spark/Hadoop APIs might put a drop-in IgniteGraphComputer within reach which would give us an idea of how performant and useful we could expect the system to be overall before we invest in the "big change" of IgniteGraphActors or whatever middle-ground between the GraphActors, GraphComputer, and GraphProvider frameworks we'll need to find to realise an Apache Ignite backend within JanusGraph.

I also wanted to mention my thoughts on IGFS (Ignite File System) which runs either on top of HDFS, between HDFS and Spark, or standalone (I believe). My thinking is that we can store side-effect data structures in IGFS and it will enable the same ACID-ness on distributed side-effect data structures we would be getting for elements in the Graph/RDD data structure via IgniteRDD or what have you. From there, persistence to HDFS via a BulkDumperVertexProgram or BulkExportVertexProgram would be possible, as would running spark job chains with addE()/V() and drop() on the IgniteRDD or transformations thereof, opening up a path to ETL type workflows involving other "Big Data" creatures/tools. Further, we could then persist back into JanusGraph with a BulkLoaderVertexProgram implementation. So again, this is somewhat of a GraphComputer/GraphActors hybrid, but I'm not sure I mind. Jonathan Ellithorpe mentioned his implementation of TinkerPop over RAMcloud a while back on the TinkerPop mailing list as part of a benchmarking effort - we could ask him about how performant that was as it sounds similar to what this would be. Benchmarks would be nice, too :)

I'm interested in what people think of on-boarding this kind of processing engine in principle, even if all the optimistic assumptions of the feasibility of Ignite I have made here turn out to be unfounded. Are there other options we should consider besides Ignite, or should we stick closer to home and simply implement the GraphActors/Partitioner/Partitions standard Marko is working on directly with Cassandra/HBase as a giant refactor over time? Clearly, this is a change we can move up or down our development schedule and spend a while getting right, but if performant I see a lot of value here.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [PROPOSAL] Replace FulgoraGraphComputer by IgniteGraphActors/IgniteGraphComputer

song <dzso...@...>
 

Hi Demai,

Preliminary tests showed slower performance than tinkerpop-spark in OLAP (largely due inherent constraints of Ignite) and faster OLTP than titan-cassandra in most cases (largely because we are using cache). We will prepare benchmark numbers as we draw closer to GA release (early March).

Frankly speaking, this won't be the fastest OLTP or OLAP engine in either category. We can't beat application-specific constructs and algorithms on top of Ignite or Spark, because we are adding a general-purpose abstraction. With a graph system that truly integrates OLTP and OLAP, we should expect to lose some performance, but gain a lot of flexibility.

Our first version was actually based on Titan. And we killed it after a few months and a lot of debates. Its constructs became constraints as we were integrating OLTP and OLAP. It made more sense to start with blank slate.

Song


On Thursday, February 16, 2017 at 2:40:49 AM UTC+8, Demai wrote:
Song,

great that you guys already have the implement with ignite. Would you please share some performance numbers, for example comparing to Titan-on-HBase, how much Ignite improve?  My team consider Janus+HBase at this moment, but open to better solutions. Thanks

Demai

 

On Wed, Feb 15, 2017 at 1:55 AM, <dz...@...> wrote:
Hey, we already have a tinkerpop-ignite implementation as part of an in-house project. We'd be happy to merge it with Janus, if we can work together.

You are right about the convergence of OLTP and OLAP via Ignite. That's why we chose it in the first place. We avoided using IgniteRDD and based everything on IgniteCache, for this reason as well. When you have TP and AP tasks running on the same data, you need a series of new designs and mechanisms to manage consistency.

For the distributed TP implementation or actually, asynchronous parallelism, we may not follow the actor model. Consider this: http://vertx.io/. So it will involve changes to Janus and TinkerPop.

Hence, this merger won't be straightforward. Merging software never is...

I'm sure we can work it out the details. But that requires we all know and agree on where we are heading. This approach leads to an alternative data system, with a unique set of features and tradeoffs. We think of it as a data system designed for data science, enabling data-centric computing, end-to-end.

And that's why we didn't fork Titan, because it contained design decisions that don't quite align with the goal of OLTP-OLAP convergence.

Anyway, let's talk :)

song

On Friday, January 27, 2017 at 5:34:48 AM UTC+8, Dylan Bethune-Waddell wrote:
So I've been toying with this idea for a while, but I want to make it concrete here so we can shoot it down, mutate, or build it up as needed:

I think that we should on-board an additional backend for distributed, asynchronous OLAP/OLTP "hybrid" processing, which Marko Rodriguez is implementing a reference implementation for right now in TinkerPop (link to branch and JIRA). Here's a link to a figure he attached to the JIRA issue. We need to drop or change the name of FulgoraGraphComputer anyways, so my suggestion is "repeal and replace" - let's take a stab at providing an implementation of Marko's new work, which he is doing with Akka, whereby we will use Apache Ignite as a backend instead with the goal of addressing a concern raised/discussed in the JIRA issue:
  • If transactions are worked out, then distributed OLTP Gremlin provides mutation capabilities (something currently not implemented for GraphComputer). That is addVaddEdrop, etc. just works. *Caveate, transactions in this environment across GremlinServer seems difficult.*

Indeed, which is why I propose we try out Apache Ignite as a way to achieve this functionality while replacing our in-memory, single-machine graph computer (Fulgora) with a distributed in-memory graph computer that will suit most OLAP-style analytics needs with the benefit of full OLTP TinkerPop API functionality and integration points with Spark and tools that work with data stored in HDFS. Quoting from this article, which admittedly reads as a long-form advertisement for Apache Ignite:
  • Ignite can provide shared storage, so state can be passed from one Spark application or job to another
  • Ignite can provide SQL with indexing so Spark SQL can be accelerated over 1,000x
  • When working with files instead of RDDs, the Apache Ignite In-Memory File System (IGFS) can also share state between Spark jobs and applications
My understanding is that Apache Ignite essentially provides a distributed In-Memory cache that is compatible with and a "drop-in" value add to both HDFS and Apache Spark with an integration point for Cassandra as well. To me the RDD-like data structure of Ignite which maintains state complete with ACID transactions via an SQL interface would therefore address Marko's concern about distributed transactions. Here are a few jumping-off points we could learn from:

1. SQLG - MIT Licensed by Pieter Martin, who is quite active on the TinkerPop mailing lists and JIRA. We could ask nicely if he would like to lend a hand, as I am looking at the SQL interface as the most sensible target here.
2. Marko Rodriguez's reference implementation of GraphActors with Akka - as mentioned above this is an active body of work that has not yet been released or merged into TinkerPop master, but is what I suggest we target here.
3. Ted Wilmes' SQL-gremlin - while this goes "the other way" and maps a TinkerPop-enabled database to a tabular representation so that you can run SQL queries over the data, I'm guessing we'll see plenty of overlapping gotchas that Ted already ran into.
4. SparkGraphComputer - or "the thing that already works". Apache Ignite shadowing the Spark/Hadoop APIs might put a drop-in IgniteGraphComputer within reach which would give us an idea of how performant and useful we could expect the system to be overall before we invest in the "big change" of IgniteGraphActors or whatever middle-ground between the GraphActors, GraphComputer, and GraphProvider frameworks we'll need to find to realise an Apache Ignite backend within JanusGraph.

I also wanted to mention my thoughts on IGFS (Ignite File System) which runs either on top of HDFS, between HDFS and Spark, or standalone (I believe). My thinking is that we can store side-effect data structures in IGFS and it will enable the same ACID-ness on distributed side-effect data structures we would be getting for elements in the Graph/RDD data structure via IgniteRDD or what have you. From there, persistence to HDFS via a BulkDumperVertexProgram or BulkExportVertexProgram would be possible, as would running spark job chains with addE()/V() and drop() on the IgniteRDD or transformations thereof, opening up a path to ETL type workflows involving other "Big Data" creatures/tools. Further, we could then persist back into JanusGraph with a BulkLoaderVertexProgram implementation. So again, this is somewhat of a GraphComputer/GraphActors hybrid, but I'm not sure I mind. Jonathan Ellithorpe mentioned his implementation of TinkerPop over RAMcloud a while back on the TinkerPop mailing list as part of a benchmarking effort - we could ask him about how performant that was as it sounds similar to what this would be. Benchmarks would be nice, too :)

I'm interested in what people think of on-boarding this kind of processing engine in principle, even if all the optimistic assumptions of the feasibility of Ignite I have made here turn out to be unfounded. Are there other options we should consider besides Ignite, or should we stick closer to home and simply implement the GraphActors/Partitioner/Partitions standard Marko is working on directly with Cassandra/HBase as a giant refactor over time? Clearly, this is a change we can move up or down our development schedule and spend a while getting right, but if performant I see a lot of value here.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Plan for ElasticSearch Support

Jaguar Xiong <xiong...@...>
 

Hi,
    Is there some plan about upgrade the ElasticSearch support? Currently, the project is built against ES 1.5.1, and there are a lot of breaking changes in ElasticSearch API in 2.x and 5.x.
    I'm interested in porting/adding support of newer ES, but are there any interests in support multiple version of ES - similar to janusgraph-hbase ?

Thanks!
Jaguar Xiong


Re: Plan for ElasticSearch Support

Misha Brukman <mbru...@...>
 

Yes, there's ongoing work in https://github.com/JanusGraph/janusgraph/pull/79 to update Elasticsearch.

On Wed, Mar 1, 2017 at 1:12 AM, Jaguar Xiong <xiong...@...> wrote:
Hi,
    Is there some plan about upgrade the ElasticSearch support? Currently, the project is built against ES 1.5.1, and there are a lot of breaking changes in ElasticSearch API in 2.x and 5.x.
    I'm interested in porting/adding support of newer ES, but are there any interests in support multiple version of ES - similar to janusgraph-hbase ?

Thanks!
Jaguar Xiong

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[DISCUSS] Elasticsearch Http using Jest

Keith Lohnes <loh...@...>
 

I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

Alexander Patrikalakis <amcpatr...@...>
 

resending because I made a typo: hadoop -> hbase

We could do a split for ES much like the split that we do for hbase (098, 10 etc). have a janusgraph-es/janusgraph-es-core that JanusGraph uses, and shims like janusgraph-es1, janusgraph-es2-transport, and janusgraph-es5-jest. Since we are newly introducing 2.X support in the in-flight PR and the coding is already done, perhaps we only add transport support on es2 lineage and only support HTTP/jest on es5 lineage?


On Friday, March 3, 2017 at 12:40:07 AM UTC+9, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

Alexander Patrikalakis <amcpatr...@...>
 

In the scheme below if 1.5=transport and 2.x=transport and 5.x=http/jest, then we don't need to pull out protocols in the shim names, so the shims would just be:
janusgraph-es/janusgraph-es1
janusgraph-es/janusgraph-es2
janusgraph-es/janusgraph-es5


On Saturday, March 4, 2017 at 12:07:24 AM UTC+9, Alexander Patrikalakis wrote:
resending because I made a typo: hadoop -> hbase

We could do a split for ES much like the split that we do for hbase (098, 10 etc). have a janusgraph-es/janusgraph-es-core that JanusGraph uses, and shims like janusgraph-es1, janusgraph-es2-transport, and janusgraph-es5-jest. Since we are newly introducing 2.X support in the in-flight PR and the coding is already done, perhaps we only add transport support on es2 lineage and only support HTTP/jest on es5 lineage?

On Friday, March 3, 2017 at 12:40:07 AM UTC+9, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [PROPOSAL] Default query.fast-property to TRUE

Alexander Patrikalakis <amcpatr...@...>
 

I'm in support of this. I did not see a convincing need for defaulting to false in Titan, and has potential to double the latency of traversals due to lazy loading.


On Tuesday, February 14, 2017 at 1:42:39 AM UTC+9, Ted Wilmes wrote:
Currently, query.fast-property is defaulted to false which means that vertex property retrievals require repeated trips to the storage backend.  Issue 104 proposes that this be changed and defaulted back to true.  This would mean that the first retrieval of a vertex property would also retrieve all of the vertex's other properties.  This may be prohibitively expensive in cases where there are a large number of properties on vertices but I believe that in the majority of cases, the reduction in storage back-and-forth will outweigh the extra property payload size.

Having said that, we'd like further feedback if anyone thinks that changing this default is not a good idea.

I'll leave this proposal open for 72 hours and assume lazy consensus and move forward if no objections are lodged.

Thanks,
Ted


Re: [DISCUSS] Elasticsearch Http using Jest

sjudeng <sju...@...>
 

I think JanusGraph should end formal support for 1.x, though it would be great if users could still have the ability to use it (even if unsupported/tested) via your Jest-based implementation. For one I'm biased as the author of the above PR, which does drop support for 1.x and I'd really like to see get merged before moving forward. More importantly though in working through updates to support 5.x I was happy to find the Elasticsearch distribution zip artitfacts available for 2.x and 5.x (but not 1.x) in Maven central. The availability of the ES distribution for 2.x and 5.x artifacts enables their automated use in unit tests and in building JanusGraph releases (e.g. for use in running embedded ES instances). This allows for JanusGraph to remove the hacked elasticsearch and elasticsearch.in.sh scripts from janusgraph-dist and also avoids the JarHell issues both during testing and when starting embedded ES instances. This improves maintainability and stability.

The proposed update to support HTTP ES client via Jest sounds great to me. Personally I think it would be nice to avoid adding compability shims to enable the cross-version support. But either way I do want to make sure that testing rigor is not lost in the updates. I think formal/full support for any ES version should require that the full janusgraph-es test suite can be run (automatically) against that version and corresponding embedded ES instances for that version are supported through janusgraph-dist. I have tested this works for 2.x and 5.x, but I'm not sure that it would work for 1.x ... at least not without really going backwards/hacking project configuration.

If you're able to make this work to support full test suite/embedded instances across the three versions cleanly, that'd be great. Otherwise I'd propose docs would be updated to indicate that JanusGraph fully supports 2.x and 5.x but that 1.x is no longer maintained (read "tested") though can still be used if necessary through the relevant Jest-jar change. It seems to me this would be sufficient. JanusGraph has already started down the road of making breaking changes in the coarse of moving beyond Titan, including dropping support for old versions of HBase and updating TinkerPop, which because of the underlying Spark update would require an update to users compute clusters.


On Thursday, March 2, 2017 at 9:40:07 AM UTC-6, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

Jason Plurad <plu...@...>
 

It's also similar to the split that we'll have with Cassandra with cassandra-thrift and cassandra-cql.

Since ES is going in the direction of HTTP-only client access (good read here), it might make more sense to use their client API rather than Jest. With ES introducing their own Java REST Client with version 5.0.0, I wonder what Jest's longevity would be, other than perhaps backwards compatibility with pre-5.0 ES versions.


On Friday, March 3, 2017 at 10:08:57 AM UTC-5, Alexander Patrikalakis wrote:
In the scheme below if 1.5=transport and 2.x=transport and 5.x=http/jest, then we don't need to pull out protocols in the shim names, so the shims would just be:
janusgraph-es/janusgraph-es1
janusgraph-es/janusgraph-es2
janusgraph-es/janusgraph-es5


On Saturday, March 4, 2017 at 12:07:24 AM UTC+9, Alexander Patrikalakis wrote:
resending because I made a typo: hadoop -> hbase

We could do a split for ES much like the split that we do for hbase (098, 10 etc). have a janusgraph-es/janusgraph-es-core that JanusGraph uses, and shims like janusgraph-es1, janusgraph-es2-transport, and janusgraph-es5-jest. Since we are newly introducing 2.X support in the in-flight PR and the coding is already done, perhaps we only add transport support on es2 lineage and only support HTTP/jest on es5 lineage?

On Friday, March 3, 2017 at 12:40:07 AM UTC+9, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

Keith Lohnes <loh...@...>
 

I proposed Jest for a few reasons. The ES Rest client is rather low level, as mentioned in #92. Jest is much higher level client that adds a lot of niceties for Java. I think that alone will keep Jest around for a while.

WRT to versioning, unless there's another reason aside from Jest, 2.x and 5.x could be combined as they'll likely use the same Jest jar version.

Jest also makes it simple to use the same code across 1.x, 2.x, and 5.x versions for constructing ES queries. The code I've written doesn't use anything that would change/break between the two different versions of Jest that would need to be used for 1.x and 2+ compatibility, meaning we could easily introduce http for all 3 versions.

Either way, there's no additional effort to introduce http interface in 2.x vs 5.x alone.

As I mentioned in #79 there are a couple of prs that would need to be merged in Jest for 5.x to work completely correctly.


On Friday, March 3, 2017 at 10:26:49 AM UTC-5, Jason Plurad wrote:
It's also similar to the split that we'll have with Cassandra with cassandra-thrift and cassandra-cql.

Since ES is going in the direction of HTTP-only client access (good read here), it might make more sense to use their client API rather than Jest. With ES introducing their own Java REST Client with version 5.0.0, I wonder what Jest's longevity would be, other than perhaps backwards compatibility with pre-5.0 ES versions.


On Friday, March 3, 2017 at 10:08:57 AM UTC-5, Alexander Patrikalakis wrote:
In the scheme below if 1.5=transport and 2.x=transport and 5.x=http/jest, then we don't need to pull out protocols in the shim names, so the shims would just be:
janusgraph-es/janusgraph-es1
janusgraph-es/janusgraph-es2
janusgraph-es/janusgraph-es5


On Saturday, March 4, 2017 at 12:07:24 AM UTC+9, Alexander Patrikalakis wrote:
resending because I made a typo: hadoop -> hbase

We could do a split for ES much like the split that we do for hbase (098, 10 etc). have a janusgraph-es/janusgraph-es-core that JanusGraph uses, and shims like janusgraph-es1, janusgraph-es2-transport, and janusgraph-es5-jest. Since we are newly introducing 2.X support in the in-flight PR and the coding is already done, perhaps we only add transport support on es2 lineage and only support HTTP/jest on es5 lineage?

On Friday, March 3, 2017 at 12:40:07 AM UTC+9, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

sjudeng <sju...@...>
 

Regarding the compatibility shim approach I think this should be avoided if at all possible. I don't think using Jest gets away from needing version-specific ES client code to support other (node/transport) clients in code base (unless we drop node/transport clients in favor of HTTP-only) and definitely to supported running embedded ES in testing/release. Unless I'm wrong about this then if we did want to do compatibility shim approach I think we'd end up needing to create separate JanusGraph releases tied to the specific version of ES. This is not currently necessary as one JanusGraph release can service all versions for relevant modules (e.g. hbase), though I don't know if this will come up again with cassandra-cql work. I really don't think this complexity should be introduced just to continue supporting ES 1.x.


Re: [DISCUSS] Elasticsearch Http using Jest

sjudeng <sju...@...>
 

Although the more I think about it I guess this issue is going to be present no matter what until we can go full HTTP. Just to throw it out there, why not drop node/transport and go full HTTP? It's the future anyway, we can do it right now to support 2.x and 5.x and do it even cleaner once Jest PRs are merged. Then we have a single JanusGraph distribution that supports ES 1.x-5.x. User's only have to update their configs to change 9300 to 9200.


On Thursday, March 2, 2017 at 9:40:07 AM UTC-6, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

Keith Lohnes <loh...@...>
 

I don't see a problem with just going full http, it would definitely make things easier for me. But the Jest jars for 1.x vs 2.x + are going to need to be different. I'm not sure what the preferred method of dealing with that would be.


On Friday, March 3, 2017 at 11:31:44 AM UTC-5, sjudeng wrote:
Although the more I think about it I guess this issue is going to be present no matter what until we can go full HTTP. Just to throw it out there, why not drop node/transport and go full HTTP? It's the future anyway, we can do it right now to support 2.x and 5.x and do it even cleaner once Jest PRs are merged. Then we have a single JanusGraph distribution that supports ES 1.x-5.x. User's only have to update their configs to change 9300 to 9200.

On Thursday, March 2, 2017 at 9:40:07 AM UTC-6, Keith Lohnes wrote:
I started some conversation over at https://github.com/JanusGraph/janusgraph/pull/79#pullrequestreview-24343839, and Jason Plurad suggested I move that over here. 

I have some code that's been used in a Titan deployment using the apache licensed Jest ES http client. There was some discussion in that PR about whether to continue to support the Transport/node client in there as well.

The key points of the conversation there
1. Versions to support (1.x, 2.x, 5.x)

   With the Jest client, we could support all three pretty easily. 1 vs 2.x and 5.x would be changing the .jar out.  There's some open PRs in the Jest repo that need to get merged for 5.x support, but once those are and we update the jar version, we'd be able to support 5.x. Maintaining 1.x support for a little while would be nice for people with production Titan instances, as Adam Phelps pointed out. 1.x and 2.x could use the same code, they just need different jars.

2. HTTP vs Transport/Node
    I think in #92 there's a mention of Transport being deprecated. My first instinct is to say that Janus should mark Transport/Node as deprecated and continue to support Transport/Node clients until a major version release at which point support could be removed.  I have some work done to split out the Transport/Node clients from the Http client, and make for an easy removal once that decision has been made.


Re: [DISCUSS] Elasticsearch Http using Jest

sjudeng <sju...@...>
 

I think you said the Jest 2.x jars should work with ES 2.x and 5.x, right? Then I'm back to my suggestion to drop formal support for ES 1.x but documentation could be updated to provide workaround steps (e.g. manually delete jest-2.x.jar and download/add jest-1.x.jar to classpath) to allow (untested) support for legacy ES 1.x deployments. It's great Jest gives us this option for basically free because I really don't think JanusGraph should introduce build/release complexity just to accommodate it. In my opinion if users have really stable Titan deployments and they're not able to update relevant cluster components (storage, indexing, compute), then I'd think they should stay on that baseline until the new capabilities being offered by JanusGraph are compelling enough to warrant the upgrade investment. Otherwise you're just upgrading to change names from Titan to JanusGraph. If this is a step some users want then I'd recommend JanusGraph create an initial release based on an earlier commit after name changes but before the potentially breaking updates to hbase, tinkerpop and elasticsearch.


Re: [DISCUSS] Elasticsearch Http using Jest

Keith Lohnes <loh...@...>
 

I think you said the Jest 2.x jars should work with ES 2.x and 5.x, right?

Yup. Once those PRs are merged in the Jest project.

documentation could be updated to provide workaround steps (e.g. manually delete jest-2.x.jar and download/add jest-1.x.jar to classpath) to allow (untested) support for legacy ES 1.x deployments

+1

I personally think only supporting http makes sense and going the route @sjudeng mentioned. It allows current titan users some flexibility in their migration to Janus while not stopping progress on the ES backend.

On Friday, March 3, 2017 at 1:06:15 PM UTC-5, sjudeng wrote:

I think you said the Jest 2.x jars should work with ES 2.x and 5.x, right? Then I'm back to my suggestion to drop formal support for ES 1.x but documentation could be updated to provide workaround steps (e.g. manually delete jest-2.x.jar and download/add jest-1.x.jar to classpath) to allow (untested) support for legacy ES 1.x deployments. It's great Jest gives us this option for basically free because I really don't think JanusGraph should introduce build/release complexity just to accommodate it. In my opinion if users have really stable Titan deployments and they're not able to update relevant cluster components (storage, indexing, compute), then I'd think they should stay on that baseline until the new capabilities being offered by JanusGraph are compelling enough to warrant the upgrade investment. Otherwise you're just upgrading to change names from Titan to JanusGraph. If this is a step some users want then I'd recommend JanusGraph create an initial release based on an earlier commit after name changes but before the potentially breaking updates to hbase, tinkerpop and elasticsearch.

41 - 60 of 1585