Re: [PROPOSAL] Strict Schema


ankur...@...
 

My thought schema enforcement should work at times graph CRUD only not in traversals that too optional based on use case.

~


On Thursday, December 14, 2017 at 10:37:38 PM UTC+5:30, David Pitera wrote:
I haven't fully thought about this yet, but my initial reaction is that (1) the schema enforcement should be opt-in and (2) we want to avoid performance degradation in the case that schema enforcement is not enabled. Of course schema enforcement will require some overhead.

It seems that the notion of a schema in JanusGraph is currently mute; it does enforce some dataTyping, but mostly seems that the vertexLabels/edgeLabels/propertyKeys are created mostly for their use in index definitions.

On Thu, Dec 14, 2017 at 11:49 AM, Ted Wilmes <t...@...> wrote:
Hi Florian,
I think this would be a very worthwhile addition. Provided folks are in agreement, I think a good next step would be to spec out the additions to JanusGraphManagement, a format for the schema definition that could be ingested by callers to infer the current schema (object graph, json, etc.), and also to define the interaction of this new feature and user queries, or in other words, what schema enforcement will look like.

Thanks,
Ted

On Thursday, December 14, 2017 at 7:35:33 AM UTC-6, Florian Hockmann wrote:

Currently, the schema for JanusGraph is basically only a list of allowed labels (for vertices and edges) and available properties. What's missing in my opinion is the option to specify which vertex and edge label can have which property keys and which edge labels can connect which vertex labels.


Just to give an idea of what I mean, here are two examples for the Graph of Gods:

  • Gods can have the property keys name and age, whereas locations only have a name (no age allowed).
  • The edge label brother can connect gods, but not a god with a location.

This is of course only a toy graph, but I suspect that most real-world data models contain similar constraints.

When we allow users to enforce those constraints inside of JanusGraph then they can be sure that no user of their database can insert data that doesn't comply with these constraints (e.g., a brother edge that connects a god with a location). So, a strict schema ensures that the graph is in a consistent state with respect to those constraints.[1]

In schema-less databases this schema is often included implicitly in the client applications as those applications need to know how they can access the data. So even if the database is schema-less, there is still an implicit schema. This means that updating the (implicit) schema isn't really easier without having it explicitly defined in the database as it needs to be changed in the client applications.

Having this schema explicitly defined in JanusGraph also makes it easy to tell new users what kind of data they can expect, e.g., they know that a location can't have an age, but a god can. This would also allow tools to fetch the schema from a JanusGraph instance to visualize it. Such a visualization makes it much easier to reason about the schema as it provides an easy to understand representation of it.

Finally, an explicit schema would also allow OGM (object graph mapper) tools to fetch the schema from JanusGraph and translate it into entity classes which makes it possible to only have the schema defined in just one place (DRY principle).

So, in short, I propose that JanusGraph gets a strict schema, either as the only option or as an additional option for backwards-compatibility with existing deployments and their data models.

Regards,

Florian


[1] We actually had the problem with our JanusGraph database that it contained data which shouldn’t be possible. Our schema models the network traffic of malware samples, so we have edge labels like SampleToDomain or SampleToIp that connect samples with domains or IP addresses they contacted. At some point we found edges in our graph that connected samples with domains and had an edge label of SampleToIp which is problematic as our applications of course expect an IP address when they follow a SampleToIp edge.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/69428dda-baa3-489c-99a1-c316e0728e09%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.