Re: Serialization error in JanusGraph libraries for Python (Geo Predicate)


Debasish Kanhar <d.k...@...>
 

Thanks a lot Florian for pointing out the JSON structure difference in case of Circle. I just wrapped around the JSON inside a geometry key, and did the normal workflow I was doing before, and I'm able to make geoWithin predicate working :-D :-D

Yes Florian, I was planning the same set of steps too from my end. I do have little bandwidth over this coming week, and wanted to do as much as possible before I got tied down with some other assignments. Well my initial plan just involved introducing geoPredicates, but well as per our discussion it looked like we can't being in Geo Predicate without actually first developing GeoShapes.

As for my plan for releases, I do plan following steps:
In release 0.1 (Probably which I'll be doing by end of week), it shall constitute the following features:

  1. Search/Query based on text Predicates.
  2. I don't need Serializer and Deserializer for RelationIdentifier as I think thats already implemented in Gremlin-Python. (I was able to do the queries of valueMap where result is list of Edges, and that worked. Even the query you posted on SO related to edge id serialization, that worked).
  3. GeoShape for Point & Circle. (Both looks to be done.)
  4. Geo Predicate for geoWithin. (This also looks to be done).
  5. Unit test cases. (I'm a bad developer when it comes to unit test cases, I haven't written a single unit test case for this lib yet).
  6. Wrapping all above in singe JanusGraphConnectionBuilder, so that we can abstract graphson_reader and graphson_writer configuration from end user.
  7. Cleaning and Restructuring of codebase & better comments.
  8. Update documentation, and add it part of JanusGraph documentation like you have for .NET
  9. Release 4 versions of Lib, Corresponding to JG 0.1.1, JG 0.2.0, JG 0.2.1, JG 0.3.0 with change in GraphSON version and TinkerPop version (Initially I plan just 0.3.0 version as all my recent tests have been on GraphSON 2.0, but will decide on other version depending on bandwidth availability)
Once 0.1 was out, along with documentation, I would freeze the code at that point, while creating Github issues for additional features which will need to be added. I was planning to have a strong base ready, so that in future even if I get disconnected from project, we can have some contributors to work on the same.

As for Travis CI pipeline, well that is already setup, and my latest build from master gets pushed to artifactory (in this case this is PyPi, Python library store).

As for future features, the following things are in mind, though let me know if I missed on anything:

  1. Other Geo Shapes like Box, Rectangle, Polygon etc. (I won't do shape from wkt for now, as that seems the most complicated of lot)
  2. Other Geo Predicates (This can be merged with prev release depending on how much of additional code needs to be written)
  3. Schema Management. (This is huge headache in itself. How do we open JanusGraphManagement system from a GraphTraversalSource, because in case of RemoteTraversal, irrespective of programming language, we first start with Empty graph, and then create a Traversal which then follows our Remote connection. So, then how do we start management system either from Empty graph or GraphTraversalSource)
Well, my final aim is to make this project part of JanusGraph Project, and I feel my project plan, similar to one you suggested with bit extra features added are almost in sync with what you have planned too. Now that GeoWithin is working, it feels like the above plan, though maybe be ambitious, can be achievable.

Let me know if I missed out on anything, or else there is some issue with Plan. This is first time I'm contributing to Open source community, and first time I'm developing a library using Python, so I'm bound to make mistakes.

Anyways, Thanks a lot Florian for helping me out over last few days. This has really been helpful :-)

On Sunday, 12 August 2018 15:03:21 UTC+5:30, Florian Hockmann wrote:
The serialization of Geoshape.Point needs to look exactly like what you got back from the server:

{'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [23.700001, 38.099998]}}

My advise for implementing those serializers and deserializers is to first write only unit tests to verify that it is working correctly. That enables you to verify that you have the data in exactly the format JanusGraph is expecting. Testing the serialization by sending data to the server just makes it harder to figure out where an error is. This test for example verifies that the serialization of Geoshape.Point works correctly in JanusGraph.Net. (Note also that latitude and longitude are switched in the serialized Json as that's how JanusGraph expects the coordinates.) The serialization tests of TinkerPop's Gremlin-Python could be a good starting point for your Python library.

My plan for JanusGraph.Net is by the way to first only implement a version with very limited functionality, namely only Text predicates, serialization of basic types like RelationIdentifier which is necessary to work with edges and Geoshape.Point as the easiest Geoshape, but no Geo predicates or more complicated Geoshapes. My reasoning for that is that the review will already be quite a burden on reviewers as they have to review a complete library at once, including initial documentation, the build process, and CI with Travis. Other features can be still added later with separate PRs.
I don't know if your intention is to also add this Python library to the JanusGraph project, but when you plan on doing that, than a similar approach could also make sense.

Regarding your problems with Geoshape.Circle, I think that you just need to wrap the value in a geometry object. So, it looks like this in the end:

{
   
"@type": "janusgraph:Geoshape",
   
"@value": {
       
"geometry": {
           
"type": "Circle",
           
"coordinates": [
               
{
                   
"@type": "g:Double",
                   
"@value": 37
               
},
               
{
                   
"@type": "g:Double",
                   
"@value": 25
               
}
           
],
           
"radius": {
               
"@type": "g:Double",
               
"@value": 50
           
},
           
"properties": {
               
"radius_units": "km"
           
}
       
}
   
}
}

Here you can see that JanusGraph serializes all Geoshapes, except for Geoshape.Point, with such a geometry object.

Am Samstag, 11. August 2018 22:38:25 UTC+2 schrieb Debasish Kanhar:
Okay, adding a few things here. I was doing quick debug and added following logs to my PointSerializer & CircleSerializer class' dictify method which is used during Serialization.

serializedJSON = GraphSONUtil.typedValue(cls.GRAPHSON_BASE_TYPE, geometryJSON, cls.GRAPHSON_PREFIX)
print("Serialized json on point being called ")
print(serializedJSON)

Yields (While we add a edge with Geoshape Point)
Serialized json on point being called
{'@type': 'janusgraph:Geoshape', '@value': {'type': 'Point', 'coordinates': [{'@type': 'g:Double', '@value': 30.58}, {'@type': 'g:Double', '@value': 20.5}]}}

And similarly for Circle Serializer, it yields the following (When I pass Circle object to Geo Predicate query, which I was suspecting was wrong):
Serialised JSON on Circle is being called.
{'@type': 'janusgraph:Geoshape', '@value': {'type': 'Circle', 'coordinates': [{'@type': 'g:Double', '@value': 22}, {'@type': 'g:Double', '@value': 39}], 'radius': {'@type': 'g:Double', '@value': 50}, 'properties': {'radius_units': 'km'}}}

I guess that means the serialization is happening correctly. But maybe the format is wrong? And deserialization seems to be working as metioned before, I'm able to query back the edges based on outE() from vertex.


On Sunday, 12 August 2018 01:58:47 UTC+5:30, Debasish Kanhar wrote:
Hi Florian.

Thanks for all those tips. I made huge proceedings today in concern with Geoshapes. Thanks for pointing out that Geoshape and Geopredicate are 2 separate things. Somehow it was oversight on my part for which I could not understand that Geoshape is actually a data type which needs serializer and deserializer of its own, whereas Geo predicate are conditions which help us query based on Geoshapes.

Also thanks for pointing out that first I should try adding a Geoshape, then try querying it back, followed by testing for Geo predicates.

As for progress, I was able to write Serializer and Deserializer for Geoshapes (Point and Circle). I was able to query it back also but Geo predicates are failing. I do the following steps:

NOTE: The following was tested on GraphSON v3.0, on Graph of Gods graph

Adding of Gepshape from python:
bull = g.addV("monster").property("name", "Erymanthian Boar").next()
hercules = g.V().has("name", Text.textContainsFuzzy("herculeas")).next()
erymanthos = Point(30.58, 20.50)
edgeAdded = g.V(bull).as_("to").V(hercules).addE("battled").property("time", 4).property("place", erymanthos).to("to").next()
print(edgeAdded)

The edges get added. I see valid edge IDs, as well as my edge count has increased now.
I try to do valueMap, I do see results, but I suspect something is wrong (Is this correct valueMap you except to see from your end on .NET? )

print("Testing retrival of GeoShapes")
edgesH = g.V().has("name", Text.textContainsFuzzy("herculeas")).outE("battled").valueMap("place").toList()
print(edgesH)

And, the following is my output:
Testing retrival of GeoShapes
[{'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [23.700001, 38.099998]}}}, {'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [23.9, 37.700001]}}}, {'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [37.58, 21.5]}}}, {'place': {'@type': 'janusgraph:Geoshape', '@value': {'coordinates': [22.0, 39.0]}}}]

As you can see, I can expect the following from inside @value key as from your previous post & Geojson websites:

{
 
"type": "Point",
 
"coordinates": [125.6, 10.1]
}

and I don't see "type" key here. Can that mean Serialization is wrong? But when I query from console as follows, it works as follows:
gremlin> :> gg.V().has("name", Text.textContainsFuzzy("herculeas")).outE("battled").valueMap()
==>[time:1,place:POINT (23.700001 38.099998)]
==>[time:2,place:POINT (23.9 37.700001)]
# This is the edge which was added
==>[time:4,place:POINT (37.58 21.5)]
==>[time:12,place:POINT (22 39)]

Now, coming to issue with Geo Predicate, correct me if I need to do the following steps in order:

1: Create a Circleclass.
2: Create a Circleclass object.
3: Pass that Python (?) object to geoWithin predicate. (This fails)

My question is how will Gremlin know that my Python Circle class is going to be mapped to JanusGraph Circle class?

point = Circle(22, 39, 50)
edges = g.E().has("place", Geo.geoWithin(point)).valueMap(True).toList()

The above works on console but fails on Python:
<janusgraph_python.core.datatypes.Circle.Circle object at 0x0000000003D5ADA0>
geoWithin(
<janusgraph_python.core.datatypes.Circle.Circle object at 0x0000000003D5ADA0>)
Traceback (most recent call last):
  File "C:/Users/IBM_ADMIN/Anaconda36/Scripts/janusgraph-python", line 68, in
<module>
    edges = g.E().has("place", Geo.geoWithin(point)).valueMap(True).toList()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\process\traversal.py", line 52, in toList
    return list(iter(self))
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\process\traversal.py", line 43, in __next__
    self.traversal_strategies.apply_strategies(self)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\process\traversal.py", line 346, in apply_strategies
    traversal_strategy.apply(traversal)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\remote_connection.py", line 143, in apply
    remote_traversal = self.remote_connection.submit(traversal.bytecode)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\driver_remote_connection.py", line 54, in submit
    results = result_set.all().result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\resultset.py", line 81, in cb
    f.result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 425, in result
    return self.__get_result()
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\concurrent\futures\thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\connection.py", line 77, in _receive
    self._protocol.data_received(data, self._results)
  File "C:\Users\IBM_ADMIN\Anaconda36\lib\site-packages\gremlin_python\driver\protocol.py", line 98, in data_received
    "{0}: {1}".format(status_code, message["status"]["message"]))
gremlin_python.driver.protocol.GremlinServerError: 500: Could not call index

My geoWithin method is as follows:
def geoWithin(self, value):
    withinP = P(self.toString(), value)
    print(withinP)

    return withinP

This line:
geoWithin(<janusgraph_python.core.datatypes.Circle.Circle object at 0x0000000003D5ADA0>)
clearly means that I'm passing pure Python object, to predicate class.

I serialize i.e. specify in my GraphSONWriter class the Python classes to serialize (graphsonWriter). 
The whole project is also hosted at: janusgraph-python under latest branch.

Its getting late night, but I'll be available for few more time. If you need bit more description regarding issue, as I know you aren't versed in Python, and also regarding my project structure, let me know and I can provide that.

PS: Well, there is a problem with CIRCLE shape, but we will deal with that after we have finished with Point :-P


On Saturday, 11 August 2018 00:29:03 UTC+5:30, Florian Hockmann wrote:
There is a difference between Geoshapes and Geo predicates. You are right that Geo predicates are similar to Text predicates and don't need their own serializer as the normal serializer for P (that's the PSerializer you listed) can be used that is already included in the TinkerPop GLVs like Gremlin-Python. Geoshapes however are objects that represent things like coordinates or a circle around coordinates. See the Geoshape Data Type section of the docs for more information. So, you need one serializer and one deserializer for each Geoshape data type (point, line, circle, and so on).

All the above serializers implement _GraphSONTypeIO class. So do I need to write another Serilizer namely Geoshape serializer / desirilizer include in list of existing serilizers?

Yes, exactly, you need to write similar serializers and deserializers for the Geoshape types.

If so, that can be done, then my next question arises is how do I register the same serilizer whenever I call my Geo predicates? i.e. how do I make system know to use the above created Serilizer?

The TinkerPop docs show how such a new serializer can be registered for Gremlin-Python.

Also, thanks for idea on insertion. I was doing just Read operations till now, but I will also test out Write operation using Geo shapes. But then that will also need Serilizer to be implemented right?

Yes, that would require a serializer, but you can't really use Geo predicates without being able to serialize Geoshape types.

Am Freitag, 10. August 2018 20:28:19 UTC+2 schrieb Debasish Kanhar:
Hi Florian,

Thanks for response. Well my understanding was that the implementation was going to be similar for all Geoshapes. I guess my understanding was wrong here. Thanks for pointing that out.

If I got it right, do you mean to say that for implementing Geoshapes we will have to write our own Serializer and deserializer for geo predicates? We didn't need to do that while working on Text predicated though.

Anyways, I was going through source code for Serializer written in Python (Graphson 2.0), and I've following serializers/deserilizers implemented:

  1. _BytecodeSerializer
  2. TraversalSerializer
  3. VertexSerializer
  4. EdgeSerializer
  5. VertexPropertySerializer
  6. PropertySerializer
  7. TraversalStrategySerializer
  8. EnumSerializer
  9. PSerializer
  10. BindingSerializer
  11. LambdaSerializer
All the above serializers implement _GraphSONTypeIO class. So do I need to write another Serilizer namely Geoshape serializer / desirilizer include in list of existing serilizers?

If so, that can be done, then my next question arises is how do I register the same serilizer whenever I call my Geo predicates? i.e. how do I make system know to use the above created Serilizer?

Also, thanks for idea on insertion. I was doing just Read operations till now, but I will also test out Write operation using Geo shapes. But then that will also need Serilizer to be implemented right?

Thanks

On Friday, 10 August 2018 21:30:46 UTC+5:30, Florian Hockmann wrote:
First of all, great to hear that someone is working on a Python driver for JanusGraph!

At a first glance I'd say that the serialization of Geoshape.circle() looks wrong. You serialize them as if they were predicates when they really are objects. In your stack trace it looks like this:

{
    "predicate": "Geoshape.circle",
    "value": "37.97, 23.72, 50"
}

whereas it should look something like this:

{
    "@type": "janusgraph:Geoshape",
    "@value": {
        "geometry": {
            "type": "Circle",
            "coordinates": [
                {
                    "@type": "g:Double",
                    "@value": 37
                },
                {
                    "@type": "g:Double",
                    "@value": 25
                }
            ],
            "radius": {
                "@type": "g:Double",
                "@value": 50
            },
            "properties": {
                "radius_units": "km"
            }
        }
    }
}

The serialization of Geoshapes is really not exactly pretty. I'd say start with the easiest one, namely Geoshape.point. I would also first only insert a Geoshape as a property to JanusGraph and test whether this works. Then, you can retrieve such a property back. (The graph of the gods which is frequently used for integration tests already contains properties for this. I created a Docker image for integration tests that comes already loaded with this graph.) That way, you can be sure that your serialization and deserialization of Geoshapes already work before you use them together with Geo predicates.
You can see how JanusGraph deserializes Geoshapes here.

Am Freitag, 10. August 2018 12:09:34 UTC+2 schrieb Debasish Kanhar:
Hi all,

I'm currently building JanusGraph libraries for Python so that we can extend functionalities of JanusGraph indexed lookup and Schema management using non JVM based languages.

I was planning a 0.1 release in few weeks with following features as starting point:

1: Implement Text Predicated, like textContains etc for Python. (Done)
2: Implement Geo predicates, like geoWithin etc for Python.
3: To be able to serialize Edge IDs. (Done).

Once 0.1 is out, and we have made that project part of official JanusGraph, along with docs added, I was planning to add Schema management utility to Python, though that is for later stage.

Will there be any big feature which I'm missing out on? Please point out.

Now back to original query, so when I try Geo predicates, my queries are failing. I feel that I'm doing something silly. Please suggest me if I'm doing anything wrong.

So, we have Gremlin Python's Predicate class which I'm using and extending my functionality to include JanusGraph functionalities like Text and Geo.

Usual declaration for P class for using TinkerPop predicates:

@staticmethod
def between(*args):
    return P("between", *args)

// Query is g.V().has("age", between(10,20)).next()
Following directive above, I implemented the similar for Text predicates as follows:
@staticmethod
def textContains(value):
    predicate = P("textContains", value)
    return predicate

// Query is: g.V().has("name", textContains("saturn")).next()

The abov

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.