HBase table definition and how flexible to change it?


Demai <nid...@...>
 

hi, Guys

new to this form and looking for a few pointers.

I am fairly familiar with HBase, hence plan to use it as the backend. I went through the 'getting-started' tutorial, and have the example up and run. I am looking for a few pointers about the design(of familycolumn, key), and how/why it was designed in such way. And then like to lead to my next questions, is it flexible(and reasonable, beneficial) to store vertex, edge and properties separately?  To do so, which code should I pay attention to? 

Many thanks

Demai


Irving Duran <irvin...@...>
 

This is a good video that I would recommend -> https://youtu.be/tLR-I53Gl9g

I would keep your vertex, edges, and properties together.


Thank You,

Irving Duran

On Thu, Feb 9, 2017 at 12:49 PM, Demai <nid...@...> wrote:
hi, Guys

new to this form and looking for a few pointers.

I am fairly familiar with HBase, hence plan to use it as the backend. I went through the 'getting-started' tutorial, and have the example up and run. I am looking for a few pointers about the design(of familycolumn, key), and how/why it was designed in such way. And then like to lead to my next questions, is it flexible(and reasonable, beneficial) to store vertex, edge and properties separately?  To do so, which code should I pay attention to? 

Many thanks

Demai

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Demai Ni <nid...@...>
 

Irving, 

thanks for the pointers. 

Neo4j is still running on single server, though there are efforts with data distribution/partition to support true cluster. maybe Neo4j doesn't worry about scalability yet. well, I am not really know much about it either. so should leave to experts to comment on it. 

But scalability is something I care. that why I am looking at JanusGraph. 

Demai 

On Thu, Feb 9, 2017 at 2:06 PM, Irving Duran <irvin...@...> wrote:
Hi Demai,
I think it goes all of the way back on when graph theory started.

Maybe one of these links will give you the answer that you are seeking.

I am not sure about Noe4j.  I played with it couple of years ago (when Hadoop was being looked at to being supported).  The problem that I ran into was scalability.  That's why I made the switch to GraphX (Apache Spark) and looking back into JanusGraph.

I hope this help.



Thank You,

Irving Duran

On Thu, Feb 9, 2017 at 3:54 PM, Demai Ni <nid...@...> wrote:
Irving, 


thanks. I just watched the whole presentation. It is pretty helpful to understand tinkerPop. However it doesn't mentioned the storage design about why vertex, edges, and properties should be put together. On another note, does Neo4j keep the three in separated file?

Anyway, appreciate the pointer

On Thu, Feb 9, 2017 at 12:58 PM, Irving Duran <irvin...@...> wrote:
This is a good video that I would recommend -> https://youtu.be/tLR-I53Gl9g

I would keep your vertex, edges, and properties together.


Thank You,

Irving Duran

On Thu, Feb 9, 2017 at 12:49 PM, Demai <nid...@...> wrote:
hi, Guys

new to this form and looking for a few pointers.

I am fairly familiar with HBase, hence plan to use it as the backend. I went through the 'getting-started' tutorial, and have the example up and run. I am looking for a few pointers about the design(of familycolumn, key), and how/why it was designed in such way. And then like to lead to my next questions, is it flexible(and reasonable, beneficial) to store vertex, edge and properties separately?  To do so, which code should I pay attention to? 

Many thanks

Demai

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.





Jerry He <jerr...@...>
 

The edges, vertices, properties and indexes are stored in a fixed table and a fixed set of column families within the table in HBase. i.e. edges are in one CF, properties are in another CF.
They are linked via the rowkey / ID.
It is probably not clearly documented anywhere. You may need to look into the org.janusgraph.diskstorage.hbase package.
Also you can start up JanusGraph with HBase, create the sample graph.  Then look at and scan the table to get a feeling.

Jerry


Demai <nid...@...>
 

Jerry,

thanks for the pointer. Since they are stored in different CFs, it is a bit similar as Neo4j.  I have the janusGraph up and running on my mac on top of HBase 1.2, thanks for the effort to make it supporting newer HBase version, will play with it as your suggested.

Demai


On Friday, February 10, 2017 at 10:33:39 AM UTC-8, Jerry He wrote:
The edges, vertices, properties and indexes are stored in a fixed table and a fixed set of column families within the table in HBase. i.e. edges are in one CF, properties are in another CF.
They are linked via the rowkey / ID.
It is probably not clearly documented anywhere. You may need to look into the org.janusgraph.diskstorage.hbase package.
Also you can start up JanusGraph with HBase, create the sample graph.  Then look at and scan the table to get a feeling.

Jerry


Demai <nid...@...>
 

interesting. I followed the example from the 'getting start' section. The HBase table contains quick a few (9 to be exact) column families, and the columnfamily name is from 'e' to 't', which I guess is generated.... kind of hard to tell what's in there, let alone to figure out which contain vertex, edge..


hbase(main):025:0> describe 'janusgraph'
Table janusgraph is ENABLED
janusgraph
COLUMN FAMILIES DESCRIPTION
{NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'h', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => '604800 SECONDS (7 DAYS)', COMPRESSIO
N => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 't', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_V
ERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

On Friday, February 10, 2017 at 12:03:51 PM UTC-8, Demai wrote:
Jerry,

thanks for the pointer. Since they are stored in different CFs, it is a bit similar as Neo4j.  I have the janusGraph up and running on my mac on top of HBase 1.2, thanks for the effort to make it supporting newer HBase version, will play with it as your suggested.

Demai


On Friday, February 10, 2017 at 10:33:39 AM UTC-8, Jerry He wrote:
The edges, vertices, properties and indexes are stored in a fixed table and a fixed set of column families within the table in HBase. i.e. edges are in one CF, properties are in another CF.
They are linked via the rowkey / ID.
It is probably not clearly documented anywhere. You may need to look into the org.janusgraph.diskstorage.hbase package.
Also you can start up JanusGraph with HBase, create the sample graph.  Then look at and scan the table to get a feeling.

Jerry


Jerry He <jerr...@...>
 

https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-hbase-parent/janusgraph-hbase-core/src/main/java/org/janusgraph/diskstorage/hbase/HBaseStoreManager.java#L246

This tells you the column family mappings. 

Jerry


Demai Ni <nid...@...>
 

Jerry,

thanks a lot. that is exactly what I was looking for.

Demai

On Mon, Feb 13, 2017 at 3:12 PM, Jerry He <jerr...@...> wrote:

--
You received this message because you are subscribed to the Google Groups "JanusGraph developer list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.