Re: [PROPOSAL] Strict Schema

Ted Wilmes <twi...@...>

I agree with your logic. I'm inclined to work through the first to see if we run into any other pitfalls and then I think that will ultimately help guide us if we 
decide to make PropertyKeys local to specific vertex and edge types. I just added an issue[1] with a first cut at the idea from an API standpoint for us to throw
darts at and update as needed.


On Monday, January 8, 2018 at 6:58:39 AM UTC-6, Florian Hockmann wrote:
Hi Ted,

I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.

However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.


Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes:
That's helpful input, Ranier, and brings up a good question as to how far we want to 
go with this. I think one option would be to keep the PropertyKey type definitions as 
they are now (global), but allow them to be mapped to specific vertex and edge 
labels. The second would be more inline with what you're suggesting, if I'm understanding 
correctly, which would be properties are only created in the context of a specific vertex
 or edge label. This would be much more familiar to the way folks are used to using 
an RDBMS, eg. the "name" property on Person, could be of a different type than 
the "name" Property on a "Building" vertex. I think this could be particularly helpful 
if we add other constraints in later. For example, say we have an "age" property 
on a Person vertex and allow a user to specify a min & a max, or a not-null. 
Ideally, they'd be able to specify a different constraint in the context of another 
vertex/edge label. This could still be done with a global propertykey definition, but the 
constraints then would be tied to the element label/propertykey tuple vs just the 
unique propertykey.

I had put together some examples of the first simpler approach, but now that I 
think about it, I'd like us to determine how far down this rabbit hole we should 
go on the first pass of this schema support work with the high level options being:

1) Define property keys globally as they are now, but allow the user to map 
them to vertex and edge labels. The implications is there is only one of each 
property key (e.g. name is always a String)

2) Define property keys in the context of a specific vertex or edge label. There 
can be more than one property key with the same name. Think column definitions in an RDBMS.

Historically, the first would be adequate for me in the majority of cases, but the 
flexibility of the second would be quite powerful.

What do you all think would be most helpful based upon your day-to-day modeling work?


On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote:
We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types.

The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types
(this also revealed the issue!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types.

We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety.

-Rainer Pichler

Join { to automatically receive all group messages.