What are the implications of using Object.class property type?


Laura Morales <lauretas@...>
 

What are the practical implications of using Object.class as a property type, instead of the other native types (eg. String.class, Integer.class, etc.)?
IIUC Object.class means that a property can take any type as its value. But then my questions are:
1. can Object.class be indexed?
2. if a property can take multiple values, how does querying work if a query for a string but one of the values is actually an integer?


hadoopmarc@...
 

Hi Laura,

A similar question was posed recently:
https://lists.lfaidata.foundation/g/janusgraph-users/message/5986

So,
1. Only for the CompositeIndex
2. In your specific example, you could use the java Integer class ( https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html ), because its constructor takes either integer type or string type and it has the equals() method implemented.

Best wishes,    Marc


Laura Morales <lauretas@...>
 

I'd like to understand a little bit more about what's going on under the hood when creating a new property with .dataType(Object.class) vs any other specific type eg. .dataType(String.class) or .dataType(Integer.class)
I'm able to create a "name" property like this:

mgmt.makePropertyKey('name').dataType(Object.class).make()

and this is what I've noticed:

- it allows me to create these vertexes
g.addV('alice').property('name', 'Alice')
g.addV('terminator').property('name', 42)
- it allows me to create a composite index, but not a mixed index (confirming what was said in the other thread)
- the composite index works when searching for an exact match, ie. .has('name', 'Alice') and .has('name', 42). The composite index does not work when searching by comparison, ie. .has('name', lt(50)) (I get the usual warning "Query requires iterating over all vertices" and it returns zero vertexes)

I'm only interested into this because I have a graph where multiple people contribute, it would be very nice to not having to deal with explicit property types, if Object.class is an option. For my particular use case I could live without mixed indexes, and I wouldn't mind a small performance deficit (size and/or speed) introduced by the usage of Object as a general type. But I really struggle to understand what's going on. What's the difference between Object and specific types from Janus' point of view? Are types only useful for enforcing a particular schema when inserting data, or there's more to it?



Sent: Wednesday, July 21, 2021 at 10:34 AM
From: hadoopmarc@...
To: janusgraph-users@...
Subject: Re: [janusgraph-users] What are the implications of using Object.class property type?
Hi Laura,

A similar question was posed recently:
https://lists.lfaidata.foundation/g/janusgraph-users/message/5986[https://lists.lfaidata.foundation/g/janusgraph-users/message/5986?p=,,,20,0,0,0::recentpostdate%252Fsticky,,mixedindex,20,2,0,83929827]

So,
1. Only for the CompositeIndex
2. In your specific example, you could use the java Integer class ( https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html[https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html] ), because its constructor takes either integer type or string type and it has the equals() method implemented.

Best wishes,    Marc


hadoopmarc@...
 

Hi Laura,

Some remarks:
  • primitive types can be stored more efficiently than general objects (an integer is exactly 32 bits, an object an be any size)
  • for the CompositeIndex the objects are fine as long as they implement the equals() method
  • your graph will be more difficult to use if you do not know what data type is in a property. Users would have to explore for themselves what object types are in the graph. This means a lot of OLAP queries with lots of waiting time and a large load on the graph system. If users only use their own data, you can give each user its own graph.
  • if you use Gremlin Server, the supported protocols will not know how to serialize arbitrary objects, so you have to instruct the client to request the string values of objects only. Then, you could have asked your users to enter object strings into janusgraph in the first place.

Best wishes,    Marc


Laura Morales <lauretas@...>
 

your graph will be more difficult to use if you do not know what data type is in a property. Users would have to explore for themselves what object types are in the graph. This means a lot of OLAP queries with lots of waiting time and a large load on the graph system.
What would be an example of a OLAP query? There is a way to get a property's type in a gremlin query?


hadoopmarc@...
 

Hi Laura,

One code example says more than 1000 words:

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g=graph.traversal(
traversal(    traversal()   
gremlin> g=graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.addV().property("lang", 45)
==>v[13]
gremlin> g.V().elementMap()
==>[id:1,label:person,name:marko,age:29]
==>[id:2,label:person,name:vadas,age:27]
==>[id:3,label:software,name:lop,lang:java]
==>[id:4,label:person,name:josh,age:32]
==>[id:5,label:software,name:ripple,lang:java]
==>[id:6,label:person,name:peter,age:35]
==>[id:13,label:vertex,lang:45]
gremlin> g.V().values("lang")
==>java
==>java
==>45
gremlin> g.V().values("lang").group().by(map{it->it.get().getClass()}).by(count())
==>[class java.lang.String:2,class java.lang.Integer:1]
gremlin>

So, this query shows you all occurring data types of a specific property in the graph.
Strictly speaking, gremlin OLAP queries are queries using withComputer(). I tend to use the term a bit looser including analytical queries requiring a full table scan.

Best wishes,

Marc