I'm interested in writing a storage backend for Solr to store nodes/edges in addition to indexes and would like some pointers on doing so. We would open source the implementation as a part of JanusGraph.
I work at an analytics startup and we're using JanusGraph to analyze the open source ecosystem. We have a Solr cluster to search over the same data that will appear in our JanusGraph database. It occurred to me that it would be nice to have one less system and so rather than store our graph data in duplicate some place like Cassandra or BigTable, why not access it on a Solr cluster? Solr is a capable database. It would be cool to use the same index but even if it were a different index served from different machines in the cluster, we would have significantly less operational overhead than if we ran something else and had to develop expertise in both systems.
So, my questions are:
* Is a Solr storage backend for nodes/edges a good idea? Terrible idea? Smart or stupid? Thoughts in general? Would you use this?
* What is the best resource on creating a storage backend? Google results are not promising.
* How much work would be involved to extend the janusgraph-solr module to also store nodes/edges?
* Is there a particular existing storage backend that I could refer to that would be similar to a Solr backend for nodes/edges? Is the in-memory the most agnostic one, or is another one more specific to something like Solr?
* How much work would be involved? The existing storage backends vary a lot in how much code they are made up of, so I have no idea of an estimate.
* Would anyone be interested in doing this work as a consultant? Advising or building? If you built it, how long would it take you?
Thanks in advance!
Russell Jurney, Founding Engineer @ Archipelo.co (we do foss graphs)