Backend data model deserialization


Elliot Block <eblock@...>
 

Hello,

Is there any supported way (e.g. a class/API) for deserializing raw data model rows, i.e. to get from raw Bigtable bytes to Vertex/edge list objects (in Java)?

https://docs.janusgraph.org/advanced-topics/data-model/

We're on the Cloud Bigtable storage backend, and it has excellent support for bulk exporting Bigtable rows (e.g. to Parquet in GCS), but we're unclear how to deserialize the raw Bigtable row/cell bytes back into usable Vertex objects.  If we were to build support for something like this, would it be a candidate for contribution back into the project?  Or is it misunderstanding the intended API/usage path?

Any thoughts greatly appreciated.  Thank you!

- Elliot


hadoopmarc@...
 

Hi Elliot,

There should be some old Titan resources that describe how the data model is binary coded into the row keys and row values. Of course, it is also implicit from the JanusGraph source code.

If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project.

Best wishes,     Marc


Boxuan Li
 

Hi Elliot,

I am not aware of existing utilities for deserialization, but as Marc has suggested, you might want to see if there are old Titan resources regarding it, since the data model hasn’t been changed since Titan -> JanusGraph migration.

If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:

final Entry colVal = StaticArrayEntry.of(StaticArrayBuffer.of(Bytes.fromHexString("0x70a0802140803800")), StaticArrayBuffer.of(Bytes.fromHexString("0x0180a076616c75e5"))); // I retrieved this hex string from Cassandra cqlsh console
final StandardSerializer serializer = new StandardSerializer();
final EdgeSerializer edgeSerializer = new EdgeSerializer(serializer);
RelationCache edgeCache = edgeSerializer.readRelation(colVal, false, (StandardJanusGraphTx) tx); // this is the deserialized edge

Bytes.fromHexString is an utility method provided by datastax cassandra driver. You might use any other library/code to convert hex string to bytes.

As you can see, there is no single easy-to-use API to deserialize raw data. If you end up creating one, I think it would be helpful if you could contribute back to the community.

Best regards,
Boxuan


On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:

Hi Elliot,

There should be some old Titan resources that describe how the data model is binary coded into the row keys and row values. Of course, it is also implicit from the JanusGraph source code.

If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project.

Best wishes,     Marc


sauverma
 

Hi Elliot

At zeotap we ve taken the same route to enable olap consumers via apache spark. We presented it in the recent janusgraph meet-up at https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376. We are using ScyllaDB as the backend.

Let's get in touch if this aligns with your requirements.

Thanks 


On Thu, May 20, 2021, 6:12 PM Boxuan Li <liboxuan@...> wrote:
Hi Elliot,

I am not aware of existing utilities for deserialization, but as Marc has suggested, you might want to see if there are old Titan resources regarding it, since the data model hasn’t been changed since Titan -> JanusGraph migration.

If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:

final Entry colVal = StaticArrayEntry.of(StaticArrayBuffer.of(Bytes.fromHexString("0x70a0802140803800")), StaticArrayBuffer.of(Bytes.fromHexString("0x0180a076616c75e5"))); // I retrieved this hex string from Cassandra cqlsh console
final StandardSerializer serializer = new StandardSerializer();
final EdgeSerializer edgeSerializer = new EdgeSerializer(serializer);
RelationCache edgeCache = edgeSerializer.readRelation(colVal, false, (StandardJanusGraphTx) tx); // this is the deserialized edge

Bytes.fromHexString is an utility method provided by datastax cassandra driver. You might use any other library/code to convert hex string to bytes.

As you can see, there is no single easy-to-use API to deserialize raw data. If you end up creating one, I think it would be helpful if you could contribute back to the community.

Best regards,
Boxuan


On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:

Hi Elliot,

There should be some old Titan resources that describe how the data model is binary coded into the row keys and row values. Of course, it is also implicit from the JanusGraph source code.

If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project.

Best wishes,     Marc


Elliot Block <eblock@...>
 

Awesome thank you all for the great info and recent presentations!  We are prototyping bulk export + deserialize from Cloud Bigtable over approx. the next week and will try to report back if we can produce something useful to share.  Thanks again, -Elliot
 
On Thu, May 20, 2021 at 6:45 AM sauverma <saurabhdec1988@...> wrote:
At zeotap we ve taken the same route to enable olap consumers via apache spark. We presented it in the recent janusgraph meet-up at https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376. We are using ScyllaDB as the backend.
  
On Thu, May 20, 2021, 6:12 PM Boxuan Li <liboxuan@...> wrote:
If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:
 
On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:
If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project. 
_._,_.