Date
1 - 16 of 16
[DISCUSS] Splitting janusgraph-cassandra
Samant Maharaj <samant...@...>
Hi, We ultimately ended up not taking this approach because it would mean that any code depending on the CQL backend would still end up pulling cassandra-all, astyanax and all their dependencies which is quite undesirable. We'd like to know what everyone thinks of the idea of splitting janusgraph-cassandra into core and implementation specific modules. If we implemented this, CQL could be moved in as another backend without any unrelated dependencies. I've implemented this change in a branch in our repository here: https://github.com/orionhealth/janusgraph/tree/feature/cassandra-refactor Some notes:
Samant Maharaj & Paul Kendall |
|
Jason Plurad <plu...@...>
Bump. I think the breakdown is fine. It looks similar to what's going on in the janusgraph-hbase directory. I'd suggest that the root remains `janusgraph-cassandra` rather than `cassandra` as your repo has it. On Sunday, March 26, 2017 at 10:11:48 PM UTC-4, Samant Maharaj wrote:
|
|
Samant Maharaj <samant...@...>
The reason for the simplified naming was to avoid nested modules having quite long names. For example, the nested astyanax module would be called 'janusgraph-cassandra-astyanax'. The other issue is that following that convention, the janusgraph-cassandra artifactId would now actually refer to the parent POM of all the cassandra modules which would conflict with its earlier incarnation. The other way to resolve this would be to call it 'janugraph-cassandra-parent' similar to hbase and hadoop. However this is quite an unconventional approach. Personally I'm in favour of a less verbose naming scheme but it's not something I'm too concerned about. On Wednesday, 5 April 2017 02:26:05 UTC+12, Jason Plurad wrote:
|
|
Alexander Patrikalakis <amcpatr...@...>
Yeah, we can't reuse the janusgraph-cassandra artifact id. On Wednesday, April 5, 2017 at 5:03:20 AM UTC+9, Samant Maharaj wrote:
|
|
sjudeng <sju...@...>
Samant & Paul, Thanks for your work on this. I like what you've done to clean up the project configs in the other modules in favor of the cassandra-test module. I do agree that for consistency it would be best to include the janusgraph- prefix in the artifact id, at least for the base cassandra module. Whether it can be janusgraph-cassandra or needs to be janusgraph-cassandra-parent, I don't know. Since presumably you'll retarget to 0.2.0-SNAPSHOT, as long as there are no 0.2.0 snapshot deployments before your PR gets merged, wouldn't it be fine to reuse janusgraph-cassandra in this new context? Also what about making the groupId org.janusgraph.cassandra for the sub-modules (core, embedded, etc.)? I'm just thinking of the eventual repository layout (see https://oss.sonatype.org/content/repositories/snapshots/org/janusgraph/) and wondering if it would be better organization to group the sub-modules under the additional cassandra group element. |
|
Samant Maharaj <samant...@...>
I really don't think it's advisable to reuse the janusgraph-cassandra artifactId. It's extremely unexpected for the meaning of an artifactId to change after a version bump. Essentially janusgraph-cassandra would cease to mean the jar containing the JanusGraph Cassandra backend implementation and would become the parent POM for the Cassandra modules. This would have strange effects on anyone depending on that artifactId after the version change. In regards to using different groupIds for sub-modules, that's not a bad solution but given the relatively small number of modules overall might be a bit unnecessary. I favour continuing to use the org.janusgraph groupId for all modules and dropping the janusgraph prefix both in the directories and in the artifactIds as it seems a bit redundant since all the modules are scoped under the org.janusgraph groupId. Another possibility to consider is to introduce an intermediate level for backend and index implementations: / ├── docs/ ├── dist/├── janusgraph/│ ├── core/│ └── test/├── backend/ │ ├── berkeleyje/ │ ├── cassandra/ │ | ├── core/ │ | ├── test/ │ | ├── astyanax/ │ | └── cql/ │ └── hbase/ │ ├── core/ │ ├── test/ │ ├── hbase-098/ │ └── hbase-10/ ├── index/ │ ├── elasticsearch/ │ ├── lucene/ │ └── solr/ └── hadoop/ This would allow for more easily creating shared test support projects such as for all index implementations as there would be a natural place to place such projects. Additionally the parent POMs at each group would also be a great place to manage group level build configuration. This would also simplify the root POM and make maintenance easier as configuration specific to each group could be moved into the group level poms. In terms of artifact naming, one approach could be to use the project path to create the artifactId, so the Cassandra Astyanax module would be 'backend-cassandra-astyanax'. This would be consistent with the directory structure and would communicate the purpose of the artifact as well. Such an approach would also naturally prevent any name collisions for future modules. Sorry for the long post but I thought it would be good to put up a more detailed straw man so we can continue the discussion. Regards, Samant On Thursday, 6 April 2017 12:55:22 UTC+12, sjudeng wrote:
|
|
sjudeng <sju...@...>
The broader reorganization sounds interesting but I think it would probably need a separate discussion. How about keeping the janusgraph-cassandra prefix (janusgraph-cassandra-parent, etc.) for now just for consistency with the rest of the project and then pursing the broader project reorg (dropping janusgraph prefix and/or intermediate levels, etc.) later? |
|
Samant Maharaj <samant...@...>
I think that's probably a good way to go forward. I'll change to janusgraph-cassandra-parent and if there are no objections I can create a pull request for the changes. On Tuesday, 11 April 2017 13:00:36 UTC+12, sjudeng wrote:
|
|
sjudeng <sju...@...>
You might start a new VOTE thread with your proposal since it is pretty significant. I'd specifically list out the final module removals/additions (e.g. removing org.janusgraph:janusgraph-cassandra, adding janusgraph-cassandra-parent, cassandra-thrift, cassandra-embedded, etc.) that are in the PR in case anyone hasn't looked at the repo and/or followed all the discussion here. |
|
Ted Wilmes <twi...@...>
This discussion got me thinking, with the addition of the CQL backend, should we be deprecating the current Cassandra thrift-based backends at some point? We wouldn't want to do that immediately because those have the benefit of being around for a long time and going through production use & bug fixes, but it seems like that would be the natural progression in the not too distant future. This would require an entirely different discussion, but I bring it up here because I wonder how/if doing that would change this proposal. Thanks, Ted On Wednesday, April 12, 2017 at 2:14:17 PM UTC-5, sjudeng wrote:
|
|
sjudeng <sju...@...>
Hi Samant, Are you still able to move forward with getting your work on this submitted? If so do you want to call a vote on the proposed refactoring or do you want me to? One way or another I think the refactoring is definitely needed. It came up in a recent PR where configuration needed to be duplicated in janusgraph-cassandra and janusgraph-cql because of the current state of things. Thanks! |
|
sjudeng <sju...@...>
All, Samant has said he should be able to continue working on this effort. The current proposal is to refactor into the following structure. ├─ janusgraph-cassandra-parent/ │ ├── astyanax │ ├── cql │ ├── core │ ├── embedded │ ├── test │ ├── thrift +1 for me. I think it's a step in the right direction and I'd hate to see the work already done on this go to waste. Jason, is this something that needs a separate vote thread or can the work/PR just move forward? Samant, What do you think about Ted's question above? On Saturday, June 17, 2017 at 12:28:41 PM UTC-5, sjudeng wrote:
|
|
sjudeng <sju...@...>
Samant, It's been 5 days and no one has yelled too loud so I think it's reasonable to move forward with your PR on this if you're still up for it. Ted, I don't think the reorganization here would cause any additional complexities with the eventual deprecation of Thrift. On the surface it seems to me it might make it simpler to carry out the deprecation/removal if the Thrift components are separated into a standalone module like here. On Friday, June 23, 2017 at 10:45:50 PM UTC-5, sjudeng wrote:
|
|
Ted Wilmes <twi...@...>
Yeah, seems like it would be easier to tease apart the pieces we wanted to deprecate after this restructure so I think it makes good sense. --Ted On Thu, Jun 29, 2017 at 9:38 PM, sjudeng <sju...@...> wrote:
|
|
Samant Maharaj <samant...@...>
I'm now working on bringing the branch up to date against master and will raise the PR as soon as it's ready. There've been a few changes since I did that initial work so it might take a short while to get it all squared away. Regards, Samant On Friday, 30 June 2017 14:38:03 UTC+12, sjudeng wrote:
|
|
Samant Maharaj <samant...@...>
I’ve been putting a significant amount of time into the proposed cassandra split and have a branch here: https://github.com/orionhealth/janusgraph/tree/feature/cassandra-split with
the core changes.
As part of this work I’ve improved the unit testing setup so that it’s not driven by Maven. This has had some flow on effects as the existing build system has the vast majority of maven config in the root POM. I have another branch based on the
cassandra-split branch with the more significant maven changes here: https://github.com/orionhealth/janusgraph/tree/feature/maven-refactoring
In my opinion the existing maven setup does not follow best practices in a number of areas and my refactoring branch’s goal is to simplify the build system and to make it easier to work with and modify. I’d like to get your opinions on whether
this is a worthwhile change and I’d be very happy to discuss the merits of my approach with you. In any case my overall goal is to complete the cassandra-split to the point that it can be merged and it is up for discussion as to whether that includes the additional
maven refactoring.
It is probably a good idea for the maven refactoring discussion to be moved to a different thread or alternatively I could create a room on Gitter for discussion.
Thoughts?
|
|