Re: [PROPOSAL] Strict Schema


rahul.n....@...
 

We are currently evaluating JanusGraph for our new project. And it happens to be that we just had discussion today morning about the exact same use case described by Ted here -


the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex.


We found this imp feature missing from JanusGrah and can impact our final decision. Obviously, we though about same workarounds as mentioned by Am Montag to prefix property key name with vertex label.

This will be very useful addition to JanusGraph.



On Thursday, January 11, 2018 at 12:16:59 AM UTC-6, ra...@... wrote:
I also see it the same way as Florian.

As long as there are no property-specific constraints like string length, we can work with global property definitions. After all, the property name itself often implies it's data type across different element types and type conflicts are unlikely (e.g. Name -> String, Size -> Long).

But once there are such constraints (we implement them within our abstraction), element-specific property definitions could prove useful. For example, this information could be communicated to storage backends so that they can take advantage of reduced storage size and faster querying rather than seeing a serialized Java object (e.g. map String(255) to VARCHAR(255)).

-Rainer Pichler
https://twitter.com/rainerpichler

Am Montag, 8. Januar 2018 13:58:39 UTC+1 schrieb Florian Hockmann:
Hi Ted,

I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.

However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.

Regards,
Florian

Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes:
Hello,
That's helpful input, Ranier, and brings up a good question as to how far we want to 
go with this. I think one option would be to keep the PropertyKey type definitions as 
they are now (global), but allow them to be mapped to specific vertex and edge 
labels. The second would be more inline with what you're suggesting, if I'm understanding 
correctly, which would be properties are only created in the context of a specific vertex
 or edge label. This would be much more familiar to the way folks are used to using 
an RDBMS, eg. the "name" property on Person, could be of a different type than 
the "name" Property on a "Building" vertex. I think this could be particularly helpful 
if we add other constraints in later. For example, say we have an "age" property 
on a Person vertex and allow a user to specify a min & a max, or a not-null. 
Ideally, they'd be able to specify a different constraint in the context of another 
vertex/edge label. This could still be done with a global propertykey definition, but the 
constraints then would be tied to the element label/propertykey tuple vs just the 
unique propertykey.

I had put together some examples of the first simpler approach, but now that I 
think about it, I'd like us to determine how far down this rabbit hole we should 
go on the first pass of this schema support work with the high level options being:

1) Define property keys globally as they are now, but allow the user to map 
them to vertex and edge labels. The implications is there is only one of each 
property key (e.g. name is always a String)

2) Define property keys in the context of a specific vertex or edge label. There 
can be more than one property key with the same name. Think column definitions in an RDBMS.

Historically, the first would be adequate for me in the majority of cases, but the 
flexibility of the second would be quite powerful.

What do you all think would be most helpful based upon your day-to-day modeling work?

Thanks,
Ted

On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote:
We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types.

The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types
(this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types.

We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety.

-Rainer Pichler
https://twitter.com/rainerpichler

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.