Florian Hockmann <f...@...>
Currently, the schema for JanusGraph is basically only a list of allowed
labels (for vertices and edges) and available properties. What's missing in my
opinion is the option to specify which vertex and edge label can have which property
keys and which edge labels can connect which vertex labels.
Just to give an idea of what I mean, here are two examples for the Graph of Gods:
- Gods can have the property keys name and age, whereas locations only
have a name (no age allowed).
- The edge label brother can connect gods, but not a god with a
location.
This is of course only a toy graph, but I suspect that most real-world data
models contain similar constraints.
When we allow users to enforce those constraints inside of JanusGraph
then they can be sure that no user of their database can insert data that
doesn't comply with these constraints (e.g., a brother edge that connects a god
with a location). So, a strict schema ensures that the graph is in a consistent
state with respect to those constraints.[1]
In schema-less databases this schema is often included implicitly in the
client applications as those applications need to know how they can access the
data. So even if the database is schema-less, there is still an implicit
schema. This means that updating the (implicit) schema isn't really easier
without having it explicitly defined in the database as it needs to be changed
in the client applications.
Having this schema explicitly defined in JanusGraph also makes it easy
to tell new users what kind of data they can expect, e.g., they know that a
location can't have an age, but a god can. This would also allow tools to fetch
the schema from a JanusGraph instance to visualize it. Such a visualization
makes it much easier to reason about the schema as it provides an easy to
understand representation of it.
Finally, an explicit schema would also allow OGM (object graph mapper)
tools to fetch the schema from JanusGraph and translate it into entity classes
which makes it possible to only have the schema defined in just one place (DRY
principle).
So, in short, I propose that JanusGraph gets a strict schema, either as
the only option or as an additional option for backwards-compatibility with existing
deployments and their data models. Regards, Florian
[1] We actually had the problem with our JanusGraph database that it
contained data which shouldn’t be possible. Our schema models the network
traffic of malware samples, so we have edge labels like SampleToDomain or
SampleToIp that connect samples with domains or IP addresses they contacted. At
some point we found edges in our graph that connected samples with domains and
had an edge label of SampleToIp which is problematic as our applications of
course expect an IP address when they follow a SampleToIp edge.
|
|
Hi Florian, I think this would be a very worthwhile addition. Provided folks are in agreement, I think a good next step would be to spec out the additions to JanusGraphManagement, a format for the schema definition that could be ingested by callers to infer the current schema (object graph, json, etc.), and also to define the interaction of this new feature and user queries, or in other words, what schema enforcement will look like.
Thanks, Ted
toggle quoted message
Show quoted text
On Thursday, December 14, 2017 at 7:35:33 AM UTC-6, Florian Hockmann wrote: Currently, the schema for JanusGraph is basically only a list of allowed
labels (for vertices and edges) and available properties. What's missing in my
opinion is the option to specify which vertex and edge label can have which property
keys and which edge labels can connect which vertex labels.
Just to give an idea of what I mean, here are two examples for the Graph of Gods:
- Gods can have the property keys name and age, whereas locations only
have a name (no age allowed).
- The edge label brother can connect gods, but not a god with a
location.
This is of course only a toy graph, but I suspect that most real-world data
models contain similar constraints.
When we allow users to enforce those constraints inside of JanusGraph
then they can be sure that no user of their database can insert data that
doesn't comply with these constraints (e.g., a brother edge that connects a god
with a location). So, a strict schema ensures that the graph is in a consistent
state with respect to those constraints.[1]
In schema-less databases this schema is often included implicitly in the
client applications as those applications need to know how they can access the
data. So even if the database is schema-less, there is still an implicit
schema. This means that updating the (implicit) schema isn't really easier
without having it explicitly defined in the database as it needs to be changed
in the client applications.
Having this schema explicitly defined in JanusGraph also makes it easy
to tell new users what kind of data they can expect, e.g., they know that a
location can't have an age, but a god can. This would also allow tools to fetch
the schema from a JanusGraph instance to visualize it. Such a visualization
makes it much easier to reason about the schema as it provides an easy to
understand representation of it.
Finally, an explicit schema would also allow OGM (object graph mapper)
tools to fetch the schema from JanusGraph and translate it into entity classes
which makes it possible to only have the schema defined in just one place (DRY
principle).
So, in short, I propose that JanusGraph gets a strict schema, either as
the only option or as an additional option for backwards-compatibility with existing
deployments and their data models. Regards, Florian
[1] We actually had the problem with our JanusGraph database that it
contained data which shouldn’t be possible. Our schema models the network
traffic of malware samples, so we have edge labels like SampleToDomain or
SampleToIp that connect samples with domains or IP addresses they contacted. At
some point we found edges in our graph that connected samples with domains and
had an edge label of SampleToIp which is problematic as our applications of
course expect an IP address when they follow a SampleToIp edge.
|
|
David Pitera <piter...@...>
I haven't fully thought about this yet, but my initial reaction is that (1) the schema enforcement should be opt-in and (2) we want to avoid performance degradation in the case that schema enforcement is not enabled. Of course schema enforcement will require some overhead.
It seems that the notion of a schema in JanusGraph is currently mute; it does enforce some dataTyping, but mostly seems that the vertexLabels/edgeLabels/propertyKeys are created mostly for their use in index definitions.
toggle quoted message
Show quoted text
On Thu, Dec 14, 2017 at 11:49 AM, Ted Wilmes <twi...@...> wrote: Hi Florian, I think this would be a very worthwhile addition. Provided folks are in agreement, I think a good next step would be to spec out the additions to JanusGraphManagement, a format for the schema definition that could be ingested by callers to infer the current schema (object graph, json, etc.), and also to define the interaction of this new feature and user queries, or in other words, what schema enforcement will look like.
Thanks, Ted
On Thursday, December 14, 2017 at 7:35:33 AM UTC-6, Florian Hockmann wrote:Currently, the schema for JanusGraph is basically only a list of allowed
labels (for vertices and edges) and available properties. What's missing in my
opinion is the option to specify which vertex and edge label can have which property
keys and which edge labels can connect which vertex labels.
Just to give an idea of what I mean, here are two examples for the Graph of Gods:
- Gods can have the property keys name and age, whereas locations only
have a name (no age allowed).
- The edge label brother can connect gods, but not a god with a
location.
This is of course only a toy graph, but I suspect that most real-world data
models contain similar constraints.
When we allow users to enforce those constraints inside of JanusGraph
then they can be sure that no user of their database can insert data that
doesn't comply with these constraints (e.g., a brother edge that connects a god
with a location). So, a strict schema ensures that the graph is in a consistent
state with respect to those constraints.[1]
In schema-less databases this schema is often included implicitly in the
client applications as those applications need to know how they can access the
data. So even if the database is schema-less, there is still an implicit
schema. This means that updating the (implicit) schema isn't really easier
without having it explicitly defined in the database as it needs to be changed
in the client applications.
Having this schema explicitly defined in JanusGraph also makes it easy
to tell new users what kind of data they can expect, e.g., they know that a
location can't have an age, but a god can. This would also allow tools to fetch
the schema from a JanusGraph instance to visualize it. Such a visualization
makes it much easier to reason about the schema as it provides an easy to
understand representation of it.
Finally, an explicit schema would also allow OGM (object graph mapper)
tools to fetch the schema from JanusGraph and translate it into entity classes
which makes it possible to only have the schema defined in just one place (DRY
principle).
So, in short, I propose that JanusGraph gets a strict schema, either as
the only option or as an additional option for backwards-compatibility with existing
deployments and their data models. Regards, Florian
[1] We actually had the problem with our JanusGraph database that it
contained data which shouldn’t be possible. Our schema models the network
traffic of malware samples, so we have edge labels like SampleToDomain or
SampleToIp that connect samples with domains or IP addresses they contacted. At
some point we found edges in our graph that connected samples with domains and
had an edge label of SampleToIp which is problematic as our applications of
course expect an IP address when they follow a SampleToIp edge.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/69428dda-baa3-489c-99a1-c316e0728e09%40googlegroups.com.
|
|
Austin Sharp <austins...@...>
In our use of Titan and JanusGraph for several years now we have had a need for this. In fact, over two years ago we built our own ORM-like, schema-enforcement layer and have been using it ever since. So I think this would be an exceptionally useful addition for any serious users of JanusGraph, and would have saved us a lot of work and pain.
toggle quoted message
Show quoted text
On Thursday, December 14, 2017 at 9:07:38 AM UTC-8, David Pitera wrote: I haven't fully thought about this yet, but my initial reaction is that (1) the schema enforcement should be opt-in and (2) we want to avoid performance degradation in the case that schema enforcement is not enabled. Of course schema enforcement will require some overhead.
It seems that the notion of a schema in JanusGraph is currently mute; it does enforce some dataTyping, but mostly seems that the vertexLabels/edgeLabels/propertyKeys are created mostly for their use in index definitions. On Thu, Dec 14, 2017 at 11:49 AM, Ted Wilmes <t...@...> wrote: Hi Florian, I think this would be a very worthwhile addition. Provided folks are in agreement, I think a good next step would be to spec out the additions to JanusGraphManagement, a format for the schema definition that could be ingested by callers to infer the current schema (object graph, json, etc.), and also to define the interaction of this new feature and user queries, or in other words, what schema enforcement will look like.
Thanks, Ted
On Thursday, December 14, 2017 at 7:35:33 AM UTC-6, Florian Hockmann wrote:Currently, the schema for JanusGraph is basically only a list of allowed
labels (for vertices and edges) and available properties. What's missing in my
opinion is the option to specify which vertex and edge label can have which property
keys and which edge labels can connect which vertex labels.
Just to give an idea of what I mean, here are two examples for the Graph of Gods:
- Gods can have the property keys name and age, whereas locations only
have a name (no age allowed).
- The edge label brother can connect gods, but not a god with a
location.
This is of course only a toy graph, but I suspect that most real-world data
models contain similar constraints.
When we allow users to enforce those constraints inside of JanusGraph
then they can be sure that no user of their database can insert data that
doesn't comply with these constraints (e.g., a brother edge that connects a god
with a location). So, a strict schema ensures that the graph is in a consistent
state with respect to those constraints.[1]
In schema-less databases this schema is often included implicitly in the
client applications as those applications need to know how they can access the
data. So even if the database is schema-less, there is still an implicit
schema. This means that updating the (implicit) schema isn't really easier
without having it explicitly defined in the database as it needs to be changed
in the client applications.
Having this schema explicitly defined in JanusGraph also makes it easy
to tell new users what kind of data they can expect, e.g., they know that a
location can't have an age, but a god can. This would also allow tools to fetch
the schema from a JanusGraph instance to visualize it. Such a visualization
makes it much easier to reason about the schema as it provides an easy to
understand representation of it.
Finally, an explicit schema would also allow OGM (object graph mapper)
tools to fetch the schema from JanusGraph and translate it into entity classes
which makes it possible to only have the schema defined in just one place (DRY
principle).
So, in short, I propose that JanusGraph gets a strict schema, either as
the only option or as an additional option for backwards-compatibility with existing
deployments and their data models. Regards, Florian
[1] We actually had the problem with our JanusGraph database that it
contained data which shouldn’t be possible. Our schema models the network
traffic of malware samples, so we have edge labels like SampleToDomain or
SampleToIp that connect samples with domains or IP addresses they contacted. At
some point we found edges in our graph that connected samples with domains and
had an edge label of SampleToIp which is problematic as our applications of
course expect an IP address when they follow a SampleToIp edge.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/69428dda-baa3-489c-99a1-c316e0728e09%40googlegroups.com.
|
|
My thought schema enforcement should work at times graph CRUD only not in traversals that too optional based on use case.
~
toggle quoted message
Show quoted text
On Thursday, December 14, 2017 at 10:37:38 PM UTC+5:30, David Pitera wrote: I haven't fully thought about this yet, but my initial reaction is that (1) the schema enforcement should be opt-in and (2) we want to avoid performance degradation in the case that schema enforcement is not enabled. Of course schema enforcement will require some overhead.
It seems that the notion of a schema in JanusGraph is currently mute; it does enforce some dataTyping, but mostly seems that the vertexLabels/edgeLabels/propertyKeys are created mostly for their use in index definitions. On Thu, Dec 14, 2017 at 11:49 AM, Ted Wilmes <t...@...> wrote: Hi Florian, I think this would be a very worthwhile addition. Provided folks are in agreement, I think a good next step would be to spec out the additions to JanusGraphManagement, a format for the schema definition that could be ingested by callers to infer the current schema (object graph, json, etc.), and also to define the interaction of this new feature and user queries, or in other words, what schema enforcement will look like.
Thanks, Ted
On Thursday, December 14, 2017 at 7:35:33 AM UTC-6, Florian Hockmann wrote:Currently, the schema for JanusGraph is basically only a list of allowed
labels (for vertices and edges) and available properties. What's missing in my
opinion is the option to specify which vertex and edge label can have which property
keys and which edge labels can connect which vertex labels.
Just to give an idea of what I mean, here are two examples for the Graph of Gods:
- Gods can have the property keys name and age, whereas locations only
have a name (no age allowed).
- The edge label brother can connect gods, but not a god with a
location.
This is of course only a toy graph, but I suspect that most real-world data
models contain similar constraints.
When we allow users to enforce those constraints inside of JanusGraph
then they can be sure that no user of their database can insert data that
doesn't comply with these constraints (e.g., a brother edge that connects a god
with a location). So, a strict schema ensures that the graph is in a consistent
state with respect to those constraints.[1]
In schema-less databases this schema is often included implicitly in the
client applications as those applications need to know how they can access the
data. So even if the database is schema-less, there is still an implicit
schema. This means that updating the (implicit) schema isn't really easier
without having it explicitly defined in the database as it needs to be changed
in the client applications.
Having this schema explicitly defined in JanusGraph also makes it easy
to tell new users what kind of data they can expect, e.g., they know that a
location can't have an age, but a god can. This would also allow tools to fetch
the schema from a JanusGraph instance to visualize it. Such a visualization
makes it much easier to reason about the schema as it provides an easy to
understand representation of it.
Finally, an explicit schema would also allow OGM (object graph mapper)
tools to fetch the schema from JanusGraph and translate it into entity classes
which makes it possible to only have the schema defined in just one place (DRY
principle).
So, in short, I propose that JanusGraph gets a strict schema, either as
the only option or as an additional option for backwards-compatibility with existing
deployments and their data models. Regards, Florian
[1] We actually had the problem with our JanusGraph database that it
contained data which shouldn’t be possible. Our schema models the network
traffic of malware samples, so we have edge labels like SampleToDomain or
SampleToIp that connect samples with domains or IP addresses they contacted. At
some point we found edges in our graph that connected samples with domains and
had an edge label of SampleToIp which is problematic as our applications of
course expect an IP address when they follow a SampleToIp edge.
--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/69428dda-baa3-489c-99a1-c316e0728e09%40googlegroups.com.
|
|
Rainer Pichler <rain...@...>
We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types.
The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types.
We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety.
-Rainer Pichler https://twitter.com/rainerpichler
|
|
Hello, That's helpful input, Ranier, and brings up a good question as to how far we want to go with this. I think one option would be to keep the PropertyKey type definitions as they are now (global), but allow them to be mapped to specific vertex and edge labels. The second would be more inline with what you're suggesting, if I'm understanding correctly, which would be properties are only created in the context of a specific vertex or edge label. This would be much more familiar to the way folks are used to using an RDBMS, eg. the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex. I think this could be particularly helpful if we add other constraints in later. For example, say we have an "age" property on a Person vertex and allow a user to specify a min & a max, or a not-null. Ideally, they'd be able to specify a different constraint in the context of another vertex/edge label. This could still be done with a global propertykey definition, but the constraints then would be tied to the element label/propertykey tuple vs just the unique propertykey.
I had put together some examples of the first simpler approach, but now that I think about it, I'd like us to determine how far down this rabbit hole we should go on the first pass of this schema support work with the high level options being:
1) Define property keys globally as they are now, but allow the user to map them to vertex and edge labels. The implications is there is only one of each property key (e.g. name is always a String)
2) Define property keys in the context of a specific vertex or edge label. There can be more than one property key with the same name. Think column definitions in an RDBMS.
Historically, the first would be adequate for me in the majority of cases, but the flexibility of the second would be quite powerful.
What do you all think would be most helpful based upon your day-to-day modeling work?
Thanks, Ted
toggle quoted message
Show quoted text
On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote: We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types. The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types. We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety. -Rainer Pichler https://twitter.com/rainerpichler
|
|
Florian Hockmann <f...@...>
Hi Ted,
I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.
However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.
Regards, Florian Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes:
toggle quoted message
Show quoted text
Hello, That's helpful input, Ranier, and brings up a good question as to how far we want to go with this. I think one option would be to keep the PropertyKey type definitions as they are now (global), but allow them to be mapped to specific vertex and edge labels. The second would be more inline with what you're suggesting, if I'm understanding correctly, which would be properties are only created in the context of a specific vertex or edge label. This would be much more familiar to the way folks are used to using an RDBMS, eg. the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex. I think this could be particularly helpful if we add other constraints in later. For example, say we have an "age" property on a Person vertex and allow a user to specify a min & a max, or a not-null. Ideally, they'd be able to specify a different constraint in the context of another vertex/edge label. This could still be done with a global propertykey definition, but the constraints then would be tied to the element label/propertykey tuple vs just the unique propertykey.
I had put together some examples of the first simpler approach, but now that I think about it, I'd like us to determine how far down this rabbit hole we should go on the first pass of this schema support work with the high level options being:
1) Define property keys globally as they are now, but allow the user to map them to vertex and edge labels. The implications is there is only one of each property key (e.g. name is always a String)
2) Define property keys in the context of a specific vertex or edge label. There can be more than one property key with the same name. Think column definitions in an RDBMS.
Historically, the first would be adequate for me in the majority of cases, but the flexibility of the second would be quite powerful.
What do you all think would be most helpful based upon your day-to-day modeling work?
Thanks, Ted On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote: We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types. The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types. We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety. -Rainer Pichler https://twitter.com/rainerpichler
|
|
I agree with your logic. I'm inclined to work through the first to see if we run into any other pitfalls and then I think that will ultimately help guide us if we decide to make PropertyKeys local to specific vertex and edge types. I just added an issue[1] with a first cut at the idea from an API standpoint for us to throw darts at and update as needed.
Thanks, Ted
toggle quoted message
Show quoted text
On Monday, January 8, 2018 at 6:58:39 AM UTC-6, Florian Hockmann wrote: Hi Ted,
I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.
However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.
Regards, Florian Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes: Hello, That's helpful input, Ranier, and brings up a good question as to how far we want to go with this. I think one option would be to keep the PropertyKey type definitions as they are now (global), but allow them to be mapped to specific vertex and edge labels. The second would be more inline with what you're suggesting, if I'm understanding correctly, which would be properties are only created in the context of a specific vertex or edge label. This would be much more familiar to the way folks are used to using an RDBMS, eg. the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex. I think this could be particularly helpful if we add other constraints in later. For example, say we have an "age" property on a Person vertex and allow a user to specify a min & a max, or a not-null. Ideally, they'd be able to specify a different constraint in the context of another vertex/edge label. This could still be done with a global propertykey definition, but the constraints then would be tied to the element label/propertykey tuple vs just the unique propertykey.
I had put together some examples of the first simpler approach, but now that I think about it, I'd like us to determine how far down this rabbit hole we should go on the first pass of this schema support work with the high level options being:
1) Define property keys globally as they are now, but allow the user to map them to vertex and edge labels. The implications is there is only one of each property key (e.g. name is always a String)
2) Define property keys in the context of a specific vertex or edge label. There can be more than one property key with the same name. Think column definitions in an RDBMS.
Historically, the first would be adequate for me in the majority of cases, but the flexibility of the second would be quite powerful.
What do you all think would be most helpful based upon your day-to-day modeling work?
Thanks, Ted On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote: We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types. The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types. We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety. -Rainer Pichler https://twitter.com/rainerpichler
|
|
I also see it the same way as Florian.
As long as there are no property-specific constraints like string length, we can work with global property definitions. After all, the property name itself often implies it's data type across different element types and type conflicts are unlikely (e.g. Name -> String, Size -> Long).
But once there are such constraints (we implement them within our abstraction), element-specific property definitions could prove useful. For example, this information could be communicated to storage backends so that they can take advantage of reduced storage size and faster querying rather than seeing a serialized Java object (e.g. map String(255) to VARCHAR(255)).
-Rainer Pichler https://twitter.com/rainerpichler
Am Montag, 8. Januar 2018 13:58:39 UTC+1 schrieb Florian Hockmann:
toggle quoted message
Show quoted text
Hi Ted,
I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.
However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.
Regards, Florian Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes: Hello, That's helpful input, Ranier, and brings up a good question as to how far we want to go with this. I think one option would be to keep the PropertyKey type definitions as they are now (global), but allow them to be mapped to specific vertex and edge labels. The second would be more inline with what you're suggesting, if I'm understanding correctly, which would be properties are only created in the context of a specific vertex or edge label. This would be much more familiar to the way folks are used to using an RDBMS, eg. the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex. I think this could be particularly helpful if we add other constraints in later. For example, say we have an "age" property on a Person vertex and allow a user to specify a min & a max, or a not-null. Ideally, they'd be able to specify a different constraint in the context of another vertex/edge label. This could still be done with a global propertykey definition, but the constraints then would be tied to the element label/propertykey tuple vs just the unique propertykey.
I had put together some examples of the first simpler approach, but now that I think about it, I'd like us to determine how far down this rabbit hole we should go on the first pass of this schema support work with the high level options being:
1) Define property keys globally as they are now, but allow the user to map them to vertex and edge labels. The implications is there is only one of each property key (e.g. name is always a String)
2) Define property keys in the context of a specific vertex or edge label. There can be more than one property key with the same name. Think column definitions in an RDBMS.
Historically, the first would be adequate for me in the majority of cases, but the flexibility of the second would be quite powerful.
What do you all think would be most helpful based upon your day-to-day modeling work?
Thanks, Ted On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote: We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types. The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types. We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety. -Rainer Pichler https://twitter.com/rainerpichler
|
|
We are currently evaluating JanusGraph for our new project. And it happens to be that we just had discussion today morning about the exact same use case described by Ted here -
the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex.
We found this imp feature missing from JanusGrah and can impact our final decision. Obviously, we though about same workarounds as mentioned by Am Montag to prefix property key name with vertex label.
This will be very useful addition to JanusGraph.
On Thursday, January 11, 2018 at 12:16:59 AM UTC-6, ra...@... wrote: I also see it the same way as Florian. As long as there are no property-specific constraints like string length, we can work with global property definitions. After all, the property name itself often implies it's data type across different element types and type conflicts are unlikely (e.g. Name -> String, Size -> Long). But once there are such constraints (we implement them within our abstraction), element-specific property definitions could prove useful. For example, this information could be communicated to storage backends so that they can take advantage of reduced storage size and faster querying rather than seeing a serialized Java object (e.g. map String(255) to VARCHAR(255)). -Rainer Pichler https://twitter.com/rainerpichlerAm Montag, 8. Januar 2018 13:58:39 UTC+1 schrieb Florian Hockmann: Hi Ted,
I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.
However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.
Regards, Florian Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes: Hello, That's helpful input, Ranier, and brings up a good question as to how far we want to go with this. I think one option would be to keep the PropertyKey type definitions as they are now (global), but allow them to be mapped to specific vertex and edge labels. The second would be more inline with what you're suggesting, if I'm understanding correctly, which would be properties are only created in the context of a specific vertex or edge label. This would be much more familiar to the way folks are used to using an RDBMS, eg. the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex. I think this could be particularly helpful if we add other constraints in later. For example, say we have an "age" property on a Person vertex and allow a user to specify a min & a max, or a not-null. Ideally, they'd be able to specify a different constraint in the context of another vertex/edge label. This could still be done with a global propertykey definition, but the constraints then would be tied to the element label/propertykey tuple vs just the unique propertykey.
I had put together some examples of the first simpler approach, but now that I think about it, I'd like us to determine how far down this rabbit hole we should go on the first pass of this schema support work with the high level options being:
1) Define property keys globally as they are now, but allow the user to map them to vertex and edge labels. The implications is there is only one of each property key (e.g. name is always a String)
2) Define property keys in the context of a specific vertex or edge label. There can be more than one property key with the same name. Think column definitions in an RDBMS.
Historically, the first would be adequate for me in the majority of cases, but the flexibility of the second would be quite powerful.
What do you all think would be most helpful based upon your day-to-day modeling work?
Thanks, Ted On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote: We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types. The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types. We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety. -Rainer Pichler https://twitter.com/rainerpichler
|
|
Finally, this is the article I promised: https://www.celum.com/de/blog/type-safe-graph-queries
At first it discusses a file-system like part of CELUM's data model. This could serve as practical input to this discussion. Then it shows in depth how a query can make use of this type-safe data model.
-Rainer Pichler
Am Donnerstag, 11. Januar 2018 07:16:59 UTC+1 schrieb ra...@...:
toggle quoted message
Show quoted text
I also see it the same way as Florian. As long as there are no property-specific constraints like string length, we can work with global property definitions. After all, the property name itself often implies it's data type across different element types and type conflicts are unlikely (e.g. Name -> String, Size -> Long). But once there are such constraints (we implement them within our abstraction), element-specific property definitions could prove useful. For example, this information could be communicated to storage backends so that they can take advantage of reduced storage size and faster querying rather than seeing a serialized Java object (e.g. map String(255) to VARCHAR(255)). -Rainer Pichler https://twitter.com/rainerpichlerAm Montag, 8. Januar 2018 13:58:39 UTC+1 schrieb Florian Hockmann: Hi Ted,
I think the second option would be the better one in the long term as it allows to define property keys again for different vertices for which they have different meanings. We currently often include the vertex label again in property keys as a workaround to avoid problems with adding indexes of property keys that already existed for new vertex labels. So we have property keys like CityName, CountryName, and so on. That shouldn't be necessary anymore with your second option.
However, it's probably much easier to implement the first option as it's closer to the way property keys currently work in JanusGraph. Since even the first option would bring most of the benefits of a strict schema I would suggest that that should be implemented first and the second one in a later version.
Regards, Florian Am Sonntag, 7. Januar 2018 16:11:55 UTC+1 schrieb Ted Wilmes: Hello, That's helpful input, Ranier, and brings up a good question as to how far we want to go with this. I think one option would be to keep the PropertyKey type definitions as they are now (global), but allow them to be mapped to specific vertex and edge labels. The second would be more inline with what you're suggesting, if I'm understanding correctly, which would be properties are only created in the context of a specific vertex or edge label. This would be much more familiar to the way folks are used to using an RDBMS, eg. the "name" property on Person, could be of a different type than the "name" Property on a "Building" vertex. I think this could be particularly helpful if we add other constraints in later. For example, say we have an "age" property on a Person vertex and allow a user to specify a min & a max, or a not-null. Ideally, they'd be able to specify a different constraint in the context of another vertex/edge label. This could still be done with a global propertykey definition, but the constraints then would be tied to the element label/propertykey tuple vs just the unique propertykey.
I had put together some examples of the first simpler approach, but now that I think about it, I'd like us to determine how far down this rabbit hole we should go on the first pass of this schema support work with the high level options being:
1) Define property keys globally as they are now, but allow the user to map them to vertex and edge labels. The implications is there is only one of each property key (e.g. name is always a String)
2) Define property keys in the context of a specific vertex or edge label. There can be more than one property key with the same name. Think column definitions in an RDBMS.
Historically, the first would be adequate for me in the majority of cases, but the flexibility of the second would be quite powerful.
What do you all think would be most helpful based upon your day-to-day modeling work?
Thanks, Ted On Tuesday, December 19, 2017 at 10:55:01 AM UTC-6, Rainer Pichler wrote: We at CELUM also put a custom model on top of JanusGraph that supports a type system and multi-inheritance for vertex/edge types. The global scope of property key definitions forces us to define all properties' data type as Object as same-named properties on elements of different types might have different types (this also revealed the issue https://groups.google.com/forum/#!topic/janusgraph-dev/3KIDmHuTcwo). Overcoming this limitation should then reduce storage overhead when we can work with concrete property value types. We solved the traversal-time schema enforcement by having a (compile-time) type-safe query language on top of Gremlin that also implements the type inheritance logic (Intro: https://www.celum.com/en/blog/technology/a-querys-quest). Type inheritance is modelled via additional properties. Soon, I will release a blog article that elaborates on one of our use cases and highlights the benefits of a strict schema and type-safety. -Rainer Pichler https://twitter.com/rainerpichler
|
|