Re: Serialization error in JanusGraph libraries for Python (Geo Predicate)


Debasish Kanhar <d.k...@...>
 

Hi Florian.

Thanks for all those tips. I made huge proceedings today in concern with Geoshapes. Thanks for pointing out that Geoshape and Geopredicate are 2 separate things. Somehow it was oversight on my part for which I could not understand that Geoshape is actually a data type which needs serializer and deserializer of its own, whereas Geo predicate are conditions which help us query based on Geoshapes.

Also thanks for pointing out that first I should try adding a Geoshape, then try querying it back, followed by testing for Geo predicates.

As for progress, I was able to write Serializer and Deserializer for Geoshapes (Point and Circle). I was able to query it back also but Geo predicates are failing. I do the following steps:

NOTE: The following was tested on GraphSON v3.0, on Graph of Gods graph

Adding of Gepshape from python:
bull = g.addV("monster").property("name", "Erymanthian Boar").next()
hercules = g.V().has("name", Text.textContainsFuzzy("herculeas")).next()
erymanthos = Point(30.58, 20.50)
edgeAdded = g.V(bull).as_("to").V(hercules).addE("battled").property("time", 4).property("place", erymanthos).to("to").next()
print(edgeAdded)

The edges get added. I see valid edge IDs, as well as my edge count has increased now.
I try to do valueMap, I do see results, but I suspect something is wrong (Is this correct valueMap you except to see from your end on .NET? )

print("Testing retrival of GeoShapes")
edgesH = g.V().has("name", Text.textContainsFuzzy("herculeas")).outE("battled").valueMap("place").toList()
print(edgesH)

And, the following is my output:
Testing retrival of GeoShapes
[{'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [23.700001, 38.099998]}}}, {'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [23.9, 37.700001]}}}, {'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [37.58, 21.5]}}}, {'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [22.0, 39.0]}}}]

As you can see, I can expect the following from inside @value key as from your previous post & Geojson websites:

{
 
"type": "Point",
 
"coordinates": [125.6, 10.1]
}

and I don't see "type" key here. Can that mean Serialization is wrong? But when I query from console as follows, it works as follows:
gremlin> :> gg.V().has("name", Text.textContainsFuzzy("herculeas")).outE("battled").valueMap()
==>[time:1,place:POINT (23.700001 38.099998)]
==>[time:2,place:POINT (23.9 37.700001)]
# This is the edge which was added
==>[time:4,place:POINT (37.58 21.5)]
==>[time:12,place:POINT (22 39)]

Now, coming to issue with Geo Predicate, correct me if I need to do the following steps in order:

1: Create a Circleclass.
2: Create a Circleclass object.
3: Pass that Python (?) object to geoWithin predicate. (This fails)

My question is how will Gremlin know that my Python Circle class is going to be mapped to JanusGraph Circle class?

point = Circle(22, 39, 50)
edges = g.E().has("place", Geo.geoWithin(point)).valueMap(True).toList()

The above works on console but fails on Python:
<janusgraph_python.core.datatypes.Circle.Circle object at 0x0000000003D5ADA0>
geoWithin(
<janusgraph_python.core.datatypes.Circle.Circle object at 0x0000000003D5ADA0>)
Traceback (most recent call last):
  File "C:/Users/IBM_ADMIN/Anaconda36/Scripts/janusgraph-python", line 68, in
<module>
    edges = g.E().has("place", Geo.geoWithin(point)).valueMap(True).toList()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\process\traversal.py", line 52, in toList
    return list(iter(self))
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\process\traversal.py", line 43, in __next__
    self.traversal_strategies.apply_strategies(self)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\process\traversal.py", line 346, in apply_strategies
    traversal_strategy.apply(traversal)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\remote_connection.py", line 143, in apply
    remote_traversal = self.remote_connection.submit(traversal.bytecode)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\driver_remote_connection.py", line 54, in submit
    results = result_set.all().result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\resultset.py", line 81, in cb
    f.result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 425, in result
    return self.__get_result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\connection.py", line 77, in _receive
    self._protocol.data_received(data, self._results)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\protocol.py", line 98, in data_received
    "{0}: {1}".format(status_code, message["status"]["message"]))
gremlin_python.driver.protocol.GremlinServerError: 500: Could not call index

My geoWithin method is as follows:
def geoWithin(self, value):
    withinP = P(self.toString(), value)
    print(withinP)

    return withinP

This line:
geoWithin(<janusgraph_python.core.datatypes.Circle.Circle object at 0x0000000003D5ADA0>)
clearly means that I'm passing pure Python object, to predicate class.

I serialize i.e. specify in my GraphSONWriter class the Python classes to serialize (graphsonWriter). 
The whole project is also hosted at: janusgraph-python under latest branch.

Its getting late night, but I'll be available for few more time. If you need bit more description regarding issue, as I know you aren't versed in Python, and also regarding my project structure, let me know and I can provide that.

PS: Well, there is a problem with CIRCLE shape, but we will deal with that after we have finished with Point :-P


On Saturday, 11 August 2018 00:29:03 UTC+5:30, Florian Hockmann wrote:
There is a difference between Geoshapes and Geo predicates. You are right that Geo predicates are similar to Text predicates and don't need their own serializer as the normal serializer for P (that's the PSerializer you listed) can be used that is already included in the TinkerPop GLVs like Gremlin-Python. Geoshapes however are objects that represent things like coordinates or a circle around coordinates. See the Geoshape Data Type section of the docs for more information. So, you need one serializer and one deserializer for each Geoshape data type (point, line, circle, and so on).

All the above serializers implement _GraphSONTypeIO class. So do I need to write another Serilizer namely Geoshape serializer / desirilizer include in list of existing serilizers?

Yes, exactly, you need to write similar serializers and deserializers for the Geoshape types.

If so, that can be done, then my next question arises is how do I register the same serilizer whenever I call my Geo predicates? i.e. how do I make system know to use the above created Serilizer?

The TinkerPop docs show how such a new serializer can be registered for Gremlin-Python.

Also, thanks for idea on insertion. I was doing just Read operations till now, but I will also test out Write operation using Geo shapes. But then that will also need Serilizer to be implemented right?

Yes, that would require a serializer, but you can't really use Geo predicates without being able to serialize Geoshape types.

Am Freitag, 10. August 2018 20:28:19 UTC+2 schrieb Debasish Kanhar:
Hi Florian,

Thanks for response. Well my understanding was that the implementation was going to be similar for all Geoshapes. I guess my understanding was wrong here. Thanks for pointing that out.

If I got it right, do you mean to say that for implementing Geoshapes we will have to write our own Serializer and deserializer for geo predicates? We didn't need to do that while working on Text predicated though.

Anyways, I was going through source code for Serializer written in Python (Graphson 2.0), and I've following serializers/deserilizers implemented:

  1. _BytecodeSerializer
  2. TraversalSerializer
  3. VertexSerializer
  4. EdgeSerializer
  5. VertexPropertySerializer
  6. PropertySerializer
  7. TraversalStrategySerializer
  8. EnumSerializer
  9. PSerializer
  10. BindingSerializer
  11. LambdaSerializer
All the above serializers implement _GraphSONTypeIO class. So do I need to write another Serilizer namely Geoshape serializer / desirilizer include in list of existing serilizers?

If so, that can be done, then my next question arises is how do I register the same serilizer whenever I call my Geo predicates? i.e. how do I make system know to use the above created Serilizer?

Also, thanks for idea on insertion. I was doing just Read operations till now, but I will also test out Write operation using Geo shapes. But then that will also need Serilizer to be implemented right?

Thanks

On Friday, 10 August 2018 21:30:46 UTC+5:30, Florian Hockmann wrote:
First of all, great to hear that someone is working on a Python driver for JanusGraph!

At a first glance I'd say that the serialization of Geoshape.circle() looks wrong. You serialize them as if they were predicates when they really are objects. In your stack trace it looks like this:

{
    "predicate": "Geoshape.circle",
    "value": "37.97, 23.72, 50"
}

whereas it should look something like this:

{
    "@type": "janusgraph:Geoshape",
    "@value": {
        "geometry": {
            "type": "Circle",
            "coordinates": [
                {
                    "@type": "g:Double",
                    "@value": 37
                },
                {
                    "@type": "g:Double",
                    "@value": 25
                }
            ],
            "radius": {
                "@type": "g:Double",
                "@value": 50
            },
            "properties": {
                "radius_units": "km"
            }
        }
    }
}

The serialization of Geoshapes is really not exactly pretty. I'd say start with the easiest one, namely Geoshape.point. I would also first only insert a Geoshape as a property to JanusGraph and test whether this works. Then, you can retrieve such a property back. (The graph of the gods which is frequently used for integration tests already contains properties for this. I created a Docker image for integration tests that comes already loaded with this graph.) That way, you can be sure that your serialization and deserialization of Geoshapes already work before you use them together with Geo predicates.
You can see how JanusGraph deserializes Geoshapes here.

Am Freitag, 10. August 2018 12:09:34 UTC+2 schrieb Debasish Kanhar:
Hi all,

I'm currently building JanusGraph libraries for Python so that we can extend functionalities of JanusGraph indexed lookup and Schema management using non JVM based languages.

I was planning a 0.1 release in few weeks with following features as starting point:

1: Implement Text Predicated, like textContains etc for Python. (Done)
2: Implement Geo predicates, like geoWithin etc for Python.
3: To be able to serialize Edge IDs. (Done).

Once 0.1 is out, and we have made that project part of official JanusGraph, along with docs added, I was planning to add Schema management utility to Python, though that is for later stage.

Will there be any big feature which I'm missing out on? Please point out.

Now back to original query, so when I try Geo predicates, my queries are failing. I feel that I'm doing something silly. Please suggest me if I'm doing anything wrong.

So, we have Gremlin Python's Predicate class which I'm using and extending my functionality to include JanusGraph functionalities like Text and Geo.

Usual declaration for P class for using TinkerPop predicates:

@staticmethod
def between(*args):
    return P("between", *args)

// Query is g.V().has("age", between(10,20)).next()
Following directive above, I implemented the similar for Text predicates as follows:
@staticmethod
def textContains(value):
    predicate = P("textContains", value)
    return predicate

// Query is: g.V().has("name", textContains("saturn")).next()

The above method works, and I'm able to make Text predicate queries work from Python using lib I created.

When I try to introduce Geo predicates, everything fails. Maybe because the way I used Predicates (I use nested predicates as follows).

// Query for Geo
g
.E().has("place", geoWithin(Geoshape.circle(37,25,50))).next()

NOTE: So we have 2 predicate here, first is geoWithin predicate, and inside that we have Geoshape.circle(37,25,5). So, I use the following method defination for geoWithin predicate, a nested predicate system:


def geoWithin(self, value):
    shape = value.getShape()

    shapeP = None

    if shape == "CIRCLE":
        shapeP = P("Geoshape.circle", "{}, {}, {}".format(value.getLatitude(), value.getLongitude(), value.getRadius()))
    elif shape == "POINT":
        shapeP = P("Geoshape.point", "{}, {}".format(value.getLatitude(), value.getLongitude()))

    withinP = P("geoWithin", shapeP)

    return withinP

As you can see, first I call Predicate class with "Geoshape.circle" and "37,25,50". Then use the same object to pass that as value to Predicate with "geoWithin".

But the above fails with following Gremlin server error:

1447671 [gremlin-server-worker-1] WARN  org.apache.tinkerpop.gremlin.driver.ser.AbstractGraphSONMessageSerializerV2d0  - Request [PooledUnsafeDirectByteBuf(ridx: 394, widx: 394, cap: 428)] could not be deserialized by org.apache.tinkerpop.gremlin.driver.ser.AbstractGraphSONMessageSerializerV2d0.
org
.apache.tinkerpop.shaded.jackson.databind.JsonMappingException: Could not deserialize the JSON value as required. Nested exception: org.apache.tinkerpop.shaded.jackson.databind.JsonMappingException: Could not deserialize the JSON value as required. Nested exception: org.apache.tinkerpop.shaded.jackson.databind.JsonMappingException: Could not deserialize the JSON value as required. Nested exception: java.lang.IllegalStateException: org.apache.tinkerpop.gremlin.process.traversal.P.Geoshape.circle(java.lang.Object)
 at
[Source: (byte[])"{"requestId":{"@type":"g:UUID","@value":"b053215c-5a60-41d8-bdb7-05b8583ac901"},"processor":"traversal","op":"bytecode","args":{"gremlin":{"@type":"g:Bytecode","@value":{"step":[["E"],["has","place",{"@type":"g:P","@value":{"predicate":"geoWithin","value":{"@type":"g:P","@value":{"predicate":"Geoshape.circle","value":"37.97, 23.72, 50"}}}}],["inV"],["valueMap",true]]}},"aliases":{"g":"gg"}}}"; line: 1, column: 338]
 at
[Source: (byte[])"{"requestId":{"@type":"g:UUID","@value":"b053215c-5a60-41d8-bdb7-05b8583ac901"},"processor":"traversal","op":"bytecode","args":{"gremlin":{"@type":"g:Bytecode","@value":{"step":[["E"],["has","place",{"@type":"g:P","@value":{"predicate":"geoWithin","value":{"@type":"g:P","@value":{"predicate":"Geoshape.circle","value":"37.97, 23.72, 50"}}}}],["inV"],["valueMap",true]]}},"aliases":{"g":"gg"}}}"; line: 1, column: 338]
 at
[Source: (byte[])"{"requestId":{"@type":"g:UUID","@value":"b053215c-5a60-41d8-bdb7-05b8583ac901"},"processor":"traversal","op":"bytecode","args":{"gremlin":{"@type":"g:Bytecode","@value":{"step":[["E"],["has","place",{"@type":"g:P","@value":{"predicate":"geoWithin","value":{"@type":"g:P","@value":{"predicate":"Geoshape.circle","value":"37.97, 23.72, 50"}}}}],["inV"],["valueMap",true]]}},"aliases":{"g":"gg"}}}"; line: 1, column: 338] (through reference chain: java.util.LinkedHashMap["args"]->java.util.LinkedHashMap["gremlin"])
        at org
.apache.tinkerpop.shaded.jackson.databind.JsonMappingException.from(JsonMappingException.java:270)
        at org
.apache.tinkerpop.shaded.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:1711)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserialize(GraphSONTypeDeserializer.java:194)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserializeTypedFromAny(GraphSONTypeDeserializer.java:101)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserializeWithType(UntypedObjectDeserializer.java:712)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:529)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserialize(GraphSONTypeDeserializer.java:219)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserializeTypedFromAny(GraphSONTypeDeserializer.java:101)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserializeWithType(UntypedObjectDeserializer.java:712)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:529)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserialize(GraphSONTypeDeserializer.java:212)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserializeTypedFromObject(GraphSONTypeDeserializer.java:86)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.MapDeserializer.deserializeWithType(MapDeserializer.java:400)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68)
        at org
.apache.tinkerpop.shaded.jackson.databind.DeserializationContext.readValue(DeserializationContext.java:759)
        at org
.apache.tinkerpop.shaded.jackson.databind.DeserializationContext.readValue(DeserializationContext.java:746)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.AbstractObjectDeserializer.deserialize(AbstractObjectDeserializer.java:48)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserialize(GraphSONTypeDeserializer.java:212)
        at org
.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONTypeDeserializer.deserializeTypedFromAny(GraphSONTypeDeserializer.java:101)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.std.StdDeserializer.deserializeWithType(StdDeserializer.java:136)
        at org
.apache.tinkerpop.shaded.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68)
        at org
.apache.tinkerpop.shaded.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4001)
        at org
.apache.tinkerpop.shaded.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3079)

Looks like some sort of serialization error. Ihave set GraphSON 2.0 on both Gremlin Server and on my Python drivers also.

Is there I'm missing here?

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.