[DISCUSS] Revamp JanusGraph Management


Jan Jansen <faro...@...>
 

Hi everyone,

I started nearly one year ago to investigate how we can revamp JanusGraph Management.

Why do we want to revamp JanusGraph Management? Schema Management and Index Management is buggy
and hard to refactor without introducing breaking changes.
For users of programming languages other than Java, we don't have any solution currently.

I came up with two basic idea's:
  • Schema Management using Gremlin
  • Http Admin API using (GraphQL/GRPC)

Schema Management using Gremlin
Most of the databases allow schema management using the default query language.

Advantage:
  • Known query language

Disadvantage:
  • To be able to support management tools in different languages, we would have to port queries for all languages

Problems:
  • A massive number of internal classes have to be exported as public.
  • Refactoring wouldn't be straight forward which prevent as to change the schema class without breaking changes. (class types and order is saved in the database)
    A solution would be to implement a new Graph class from the ground up and map internal vertex types to newly created vertex types.
  • This won't allow tasks such as index repair, or reindex.

Http Admin API
GraphQL and GRPC both allow us to build a new management interface from the ground up with newly created types.

Advantage:
  • Auto generate client libs for different languages.
  • Step by step implementation
  • Allows adding health check endpoint which can be used by a Docker health check or Kubernetes liveness checks.
  • Long-running tasks can be solved using streaming in GRPC and subscriptions in GraphQL, such as repair or reindex.

After adding an admin API and revamping the internal implementation, we could come back and add a Gremlin based schema management.

Questions

  • What do the community members think about it?
  • Do you see other advantages or disadvantages?


Greetings,
Jan


Niklas Schiffler <nschi...@...>
 

IMHO the implementation of a Gremlin way to manage schemas does not provide a major benefit unless it's somewhat standardized in Tinkerpop.

An HTTP API would have the benefit of making the schema management available to users of the Gremlin server without the mentioned problems for making it accessible via Gremlin. As it would just expose the underlying Java API (current or a new one), it would be nice if the web service would be optional if JG is used embedded.

nik..


On 26-02-2020 15:39, 'Jan Jansen' via JanusGraph developers wrote:
Hi everyone,

I started nearly one year ago to investigate how we can revamp JanusGraph Management.

Why do we want to revamp JanusGraph Management? Schema Management and Index Management is buggy
and hard to refactor without introducing breaking changes.
For users of programming languages other than Java, we don't have any solution currently.

I came up with two basic idea's:
  • Schema Management using Gremlin
  • Http Admin API using (GraphQL/GRPC)

Schema Management using Gremlin
Most of the databases allow schema management using the default query language.

Advantage:
  • Known query language

Disadvantage:
  • To be able to support management tools in different languages, we would have to port queries for all languages

Problems:
  • A massive number of internal classes have to be exported as public.
  • Refactoring wouldn't be straight forward which prevent as to change the schema class without breaking changes. (class types and order is saved in the database)
    A solution would be to implement a new Graph class from the ground up and map internal vertex types to newly created vertex types.
  • This won't allow tasks such as index repair, or reindex.

Http Admin API
GraphQL and GRPC both allow us to build a new management interface from the ground up with newly created types.

Advantage:
  • Auto generate client libs for different languages.
  • Step by step implementation
  • Allows adding health check endpoint which can be used by a Docker health check or Kubernetes liveness checks.
  • Long-running tasks can be solved using streaming in GRPC and subscriptions in GraphQL, such as repair or reindex.

After adding an admin API and revamping the internal implementation, we could come back and add a Gremlin based schema management.

Questions

  • What do the community members think about it?
  • Do you see other advantages or disadvantages?


Greetings,
Jan

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8495bf4a-ad48-4df8-ab34-e45883c92186%40googlegroups.com.


Stephen Mallette <spmal...@...>
 

Josh Shinavier has been doing the most thinking around some form of Schema API for TinkerPop, but I couldn't say what the timeline is at this point for such work to make it into TinkerPop. I know he's thinking about the green fields of TP4 but I think I'd like to see "data type" capabilities in Gremlin in TP3 if possible and I'd imagine that if we could see a way for his schema ideas to be available in TP3 we would try to take them there. 

If you haven't been following his recent talks on the topic over the last few months you might want to take a look at some of his presentations that are out there:


you might find a few videos of his presentations if you search for them.  


On Fri, Feb 28, 2020 at 6:09 AM Niklas Schiffler <nschi...@...> wrote:

IMHO the implementation of a Gremlin way to manage schemas does not provide a major benefit unless it's somewhat standardized in Tinkerpop.

An HTTP API would have the benefit of making the schema management available to users of the Gremlin server without the mentioned problems for making it accessible via Gremlin. As it would just expose the underlying Java API (current or a new one), it would be nice if the web service would be optional if JG is used embedded.

nik..


On 26-02-2020 15:39, 'Jan Jansen' via JanusGraph developers wrote:
Hi everyone,

I started nearly one year ago to investigate how we can revamp JanusGraph Management.

Why do we want to revamp JanusGraph Management? Schema Management and Index Management is buggy
and hard to refactor without introducing breaking changes.
For users of programming languages other than Java, we don't have any solution currently.

I came up with two basic idea's:
  • Schema Management using Gremlin
  • Http Admin API using (GraphQL/GRPC)

Schema Management using Gremlin
Most of the databases allow schema management using the default query language.

Advantage:
  • Known query language

Disadvantage:
  • To be able to support management tools in different languages, we would have to port queries for all languages

Problems:
  • A massive number of internal classes have to be exported as public.
  • Refactoring wouldn't be straight forward which prevent as to change the schema class without breaking changes. (class types and order is saved in the database)
    A solution would be to implement a new Graph class from the ground up and map internal vertex types to newly created vertex types.
  • This won't allow tasks such as index repair, or reindex.

Http Admin API
GraphQL and GRPC both allow us to build a new management interface from the ground up with newly created types.

Advantage:
  • Auto generate client libs for different languages.
  • Step by step implementation
  • Allows adding health check endpoint which can be used by a Docker health check or Kubernetes liveness checks.
  • Long-running tasks can be solved using streaming in GRPC and subscriptions in GraphQL, such as repair or reindex.

After adding an admin API and revamping the internal implementation, we could come back and add a Gremlin based schema management.

Questions

  • What do the community members think about it?
  • Do you see other advantages or disadvantages?


Greetings,
Jan

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8495bf4a-ad48-4df8-ab34-e45883c92186%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/8104d7ae-6cb7-bb90-05e3-c87dd1e22970%40digitician.eu.


Jan Jansen <faro...@...>
 

An HTTP API would have the benefit of making the schema management available to users of the Gremlin server without the mentioned problems for making it accessible via Gremlin. As it would just expose the underlying Java API (current or a new one), it would be 

nice if the web service would be optional if JG is used embedded.


+1

One of my major concerns is to make JanusGraph more modular.


Oleksandr Porunov <alexand...@...>
 

Hi,
As a follow up to this thread I also think it would be great if we could use DDL for schema definition.
I don't know yet the format which we should use (whether it should be gremlin query or some json / xml or something else).
Is there any updates regarding Schema Management?
Just to not do double work for Schema Management, I would like to know if someone was working on making DDL for schema. If so, were there any troubles? What format is better to use for schema definition?
Currently I am thinking about JSON format, something like:
[
  {
    type: vertexLabel,
    name: myVertexLabel
  },
  {
    type: edgeLabel,
    name: myEdgeLabel
    multiplicity: MANY2ONE
  },
  {
      type: propertyKey,
      dataType: String.class
      name: myProperty,
      cardinality: LIST
   },
   {
      type: compositeIndex,
      indexOnly: myVertexLabel,
      keys: [
         {
             name: myProperty
          }
      ]
   },
   {
      type: mixedIndex,
      keys: [
         {
             name: myProperty,
             parameters: [
                {
                     key: parameterKey,
                     value: parameterValue
                }
             ]
          }
      ]
    }
]

I didn't think much about schema definition, so I might miss some points. If you have any thoughts / suggestions on the format or how should this feature be implemented, it would be great. Also, I didn't work with GraphQL, so will check a little bit later if that format might be used as well.

Best regards,
Oleksandr


On Friday, February 28, 2020 at 8:30:21 PM UTC+2 fa...@... wrote:

An HTTP API would have the benefit of making the schema management available to users of the Gremlin server without the mentioned problems for making it accessible via Gremlin. As it would just expose the underlying Java API (current or a new one), it would be 

nice if the web service would be optional if JG is used embedded.


+1

One of my major concerns is to make JanusGraph more modular.


"fa...@googlemail.com" <faro...@...>
 

Hi Oleksandr,

I started to work on a https://github.com/farodin91/janusgraph-grpc client/server for JanusGraph to management all parts of JanusGraph on of cause would be schema. Other parts are for example CFG, or index management.
My next step is integrate a basic gRPC client JanusGraph which allows us to get some basic information of running JanusGraph server. I will start to extend it by schema function step by step. The gRPC protocol can be later used to provide a tool to import and export the schema as JSON.

Greetings,
Jan

On Wednesday, July 15, 2020 at 12:23:44 PM UTC+2 Oleksandr Porunov wrote:
Hi,
As a follow up to this thread I also think it would be great if we could use DDL for schema definition.
I don't know yet the format which we should use (whether it should be gremlin query or some json / xml or something else).
Is there any updates regarding Schema Management?
Just to not do double work for Schema Management, I would like to know if someone was working on making DDL for schema. If so, were there any troubles? What format is better to use for schema definition?
Currently I am thinking about JSON format, something like:
[
  {
    type: vertexLabel,
    name: myVertexLabel
  },
  {
    type: edgeLabel,
    name: myEdgeLabel
    multiplicity: MANY2ONE
  },
  {
      type: propertyKey,
      dataType: String.class
      name: myProperty,
      cardinality: LIST
   },
   {
      type: compositeIndex,
      indexOnly: myVertexLabel,
      keys: [
         {
             name: myProperty
          }
      ]
   },
   {
      type: mixedIndex,
      keys: [
         {
             name: myProperty,
             parameters: [
                {
                     key: parameterKey,
                     value: parameterValue
                }
             ]
          }
      ]
    }
]

I didn't think much about schema definition, so I might miss some points. If you have any thoughts / suggestions on the format or how should this feature be implemented, it would be great. Also, I didn't work with GraphQL, so will check a little bit later if that format might be used as well.

Best regards,
Oleksandr

On Friday, February 28, 2020 at 8:30:21 PM UTC+2 fa...@... wrote:

An HTTP API would have the benefit of making the schema management available to users of the Gremlin server without the mentioned problems for making it accessible via Gremlin. As it would just expose the underlying Java API (current or a new one), it would be 

nice if the web service would be optional if JG is used embedded.


+1

One of my major concerns is to make JanusGraph more modular.