How to persist data on system reboot


pikachu <vmo...@...>
 

Hi all
I am new to Janusgraph/ Titan, 

I installed janusgraph 0.1.0 in aws EMR, using db backend hbase and index backend solr. 

Everything works, But when I reboot the system, All the graph data is lost , g.V() returns empty array.

Hbase is configured to use S3, and I configured solr based on this link https://blog.codecentric.de/en/2016/02/getting-started-titan-using-cassandra-solr/

My gremlin-server conf is:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://localhost:8983/solr


How can I persist data between system restarts?


Robert Dale <rob...@...>
 

Do you commit your transactions?

Robert Dale

On Mon, Mar 27, 2017 at 9:46 AM, pikachu <vmo...@...> wrote:
Hi all
I am new to Janusgraph/ Titan, 

I installed janusgraph 0.1.0 in aws EMR, using db backend hbase and index backend solr. 

Everything works, But when I reboot the system, All the graph data is lost , g.V() returns empty array.

Hbase is configured to use S3, and I configured solr based on this link https://blog.codecentric.de/en/2016/02/getting-started-titan-using-cassandra-solr/

My gremlin-server conf is:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://localhost:8983/solr


How can I persist data between system restarts?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


pikachu <vmo...@...>
 

Yes. I used Graph.tx().commit() on gremlin.sh and also graph.close()

I tried using titan db 1.0 also. The data exists till I reboot the system.

I also changed the hbase home directory to not point to the tmp directory in my local machine.

Is this problem anything related to solr configuration?


Jerry He <jerr...@...>
 

You are able to run the graph ops all successfully? You are able to retrieve the Vertices or count them correctly before the system restart?
I doubt it is related to Solr configuration.

HBase is configured to use S3, but you changed the hbase home directory to not point to the tmp directory in your local machine?  How?

Thanks,

Jerry


On Monday, March 27, 2017 at 11:20:34 AM UTC-7, pikachu wrote:
Yes. I used Graph.tx().commit() on gremlin.sh and also graph.close()

I tried using titan db 1.0 also. The data exists till I reboot the system.

I also changed the hbase home directory to not point to the tmp directory in my local machine.

Is this problem anything related to solr configuration?


pikachu <vmo...@...>
 

Yes I am able to run all operations successfully. I tried on on both AWS server and my local machine. AWS server stores hbase data in s3, and I configured the local machine to point to a directory in /home.
Is there anything I missed to persist data after reboots?

g.V() always returns empty after reboot


Jerry He <jerr...@...>
 

I am not sure what storage model you use for HBase on AWS.
If you are using this: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html
There is a section 'Shutting Down and Restoring a Cluster Without Data Loss'.
Basically you want to flush the hbase table before you shut down the cluster.  This is specific to this storage model for HBase on AWS.
I don't see why it happens on your local machine. The above step is not really needed on a local machine.
Try bringing up the 'hbase shell' to see if the data is persisted by running a 'scan' or 'count' on the janusgraph table.  Run a 'flush' on the table.  Then see if it endures a reboot on you local machine.

Thanks.

Jerry


On Monday, March 27, 2017 at 8:26:03 PM UTC-7, pikachu wrote:
Yes I am able to run all operations successfully. I tried on on both AWS server and my local machine. AWS server stores hbase data in s3, and I configured the local machine to point to a directory in /home.  
Is there anything I missed to persist data after reboots?

g.V() always returns empty after reboot


pikachu <vmo...@...>
 

Hi Jerry

Thanks for helping me out. 
When I manually run 'flush' on the table on the hbase shell, The data persists after system reboot in my local machine. Is there any command to flush the table in gremlin? 

Thanks again


On Tuesday, March 28, 2017 at 9:49:52 AM UTC+5:30, Jerry He wrote:
I am not sure what storage model you use for HBase on AWS.
There is a section 'Shutting Down and Restoring a Cluster Without Data Loss'.
Basically you want to flush the hbase table before you shut down the cluster.  This is specific to this storage model for HBase on AWS.
I don't see why it happens on your local machine. The above step is not really needed on a local machine.
Try bringing up the 'hbase shell' to see if the data is persisted by running a 'scan' or 'count' on the janusgraph table.  Run a 'flush' on the table.  Then see if it endures a reboot on you local machine.

Thanks.

Jerry

On Monday, March 27, 2017 at 8:26:03 PM UTC-7, pikachu wrote:
Yes I am able to run all operations successfully. I tried on on both AWS server and my local machine. AWS server stores hbase data in s3, and I configured the local machine to point to a directory in /home.  
Is there anything I missed to persist data after reboots?

g.V() always returns empty after reboot


pikachu <vmo...@...>
 

Hi
It seems hbase automatically flushes to disk based on the flush interval and memstore size limit configuration. So Its not a problem. Everything working fine now.

Thanks


On Tuesday, March 28, 2017 at 10:43:22 AM UTC+5:30, pikachu wrote:
Hi Jerry

Thanks for helping me out. 
When I manually run 'flush' on the table on the hbase shell, The data persists after system reboot in my local machine. Is there any command to flush the table in gremlin? 

Thanks again

On Tuesday, March 28, 2017 at 9:49:52 AM UTC+5:30, Jerry He wrote:
I am not sure what storage model you use for HBase on AWS.
There is a section 'Shutting Down and Restoring a Cluster Without Data Loss'.
Basically you want to flush the hbase table before you shut down the cluster.  This is specific to this storage model for HBase on AWS.
I don't see why it happens on your local machine. The above step is not really needed on a local machine.
Try bringing up the 'hbase shell' to see if the data is persisted by running a 'scan' or 'count' on the janusgraph table.  Run a 'flush' on the table.  Then see if it endures a reboot on you local machine.

Thanks.

Jerry

On Monday, March 27, 2017 at 8:26:03 PM UTC-7, pikachu wrote:
Yes I am able to run all operations successfully. I tried on on both AWS server and my local machine. AWS server stores hbase data in s3, and I configured the local machine to point to a directory in /home.  
Is there anything I missed to persist data after reboots?

g.V() always returns empty after reboot


Jerry He <jerr...@...>
 

Good to know.

Manual flush on the table should not be required.
HBase keeps recent writes in memory. But they have been written to WAL logs as well.  When you reboot your system without properly shuts down HBase, HBase will start recover from the WAL after restart.
It may not work on the AWS instance for S3 for apparent reason.

Thanks,

Jerry


On Tuesday, March 28, 2017 at 7:23:19 AM UTC-7, pikachu wrote:
Hi
It seems hbase automatically flushes to disk based on the flush interval and memstore size limit configuration. So Its not a problem. Everything working fine now.

Thanks

On Tuesday, March 28, 2017 at 10:43:22 AM UTC+5:30, pikachu wrote:
Hi Jerry

Thanks for helping me out. 
When I manually run 'flush' on the table on the hbase shell, The data persists after system reboot in my local machine. Is there any command to flush the table in gremlin? 

Thanks again

On Tuesday, March 28, 2017 at 9:49:52 AM UTC+5:30, Jerry He wrote:
I am not sure what storage model you use for HBase on AWS.
There is a section 'Shutting Down and Restoring a Cluster Without Data Loss'.
Basically you want to flush the hbase table before you shut down the cluster.  This is specific to this storage model for HBase on AWS.
I don't see why it happens on your local machine. The above step is not really needed on a local machine.
Try bringing up the 'hbase shell' to see if the data is persisted by running a 'scan' or 'count' on the janusgraph table.  Run a 'flush' on the table.  Then see if it endures a reboot on you local machine.

Thanks.

Jerry

On Monday, March 27, 2017 at 8:26:03 PM UTC-7, pikachu wrote:
Yes I am able to run all operations successfully. I tried on on both AWS server and my local machine. AWS server stores hbase data in s3, and I configured the local machine to point to a directory in /home.  
Is there anything I missed to persist data after reboots?

g.V() always returns empty after reboot