Date
1 - 2 of 2
Scalability issue with titan (0.5.4) and upgrade version from 0.5.4 to 1.0.0
Ted Wilmes <twi...@...>
Hi Ankit,
You will need to perform a data migration from 0.5.4 to Titan 1.0 or JanusGraph. Daniel Kuppitz and Stephen Mallette put together an example migration from 0.5.4 to 1.0.0 here: https://github.com/dkuppitz/openflights. You'll need to carefully think through how you'd do this in production and I'd recommend many practice runs on test systems to get it right first! As far as if OLTP, I'd suggest profiling your system to see where the slowdown is. Maybe there are particular queries that need to be optimized or possibly you can tune your Cassandra configuration. First thing first though, you'll need to figure out where the bottlenecks are.
--Ted
On Monday, January 30, 2017 at 9:14:58 AM UTC-6, ankit tyagi wrote:
Hi,Currently, I am using titan graph0.5.4 for making the social graph out of user activities like (user1-->follows-->user2, user1-->likes-->object2 etc.).Earlier when I started using it, a number of activities, users etc. were quite small and since then it is growing daily. So to give approximate numbers I have around 20 million user nodes and 50 million objects nodes and obviously billions of edges :).I have exposed APIs on this social graph using tinker pop library in java eg. get followers, following of user1 etc. I am using Cassandra(version2.0.14) as the underlying database.Throughput is around 50K rpm.So, overall loads of deletion, insertion, update, reads are happening on daily basis.Problems I am facingIf read increases, load gets quite high on my machines (machine config RAID-1 , 1 TB DISK ,32 core, 64GB RAM, 4 similar machines, replication fact 3, Levelled compaction)Response time/SLA of APIs also degraded over a period of time.My QueriesCan I use titan for serving this type of real time queries, or should I keep it only for analytics purpose to explore/recommend user data?Do I need to change my Cassandra configurations or machines configuration to improve performance?Any suggestions for scaling/serving this type of data?For more explanation or any other thing, please do ping. I am quite stuck over this problem :/For using JanusGraph, we need to migrate our current version to 1.0.0. Do we need data migration also because I am getting below exception while loading graph with 1.0.0.Required size [1] exceeds actual remaining size [0]Display stack trace? [yN] yjava.lang.ArrayIndexOutOfBoundsException : Required size [1] exceeds actual remaining size [0] at com.thinkaurelius.titan.diskstorage.util. StaticArrayBuffer.require( StaticArrayBuffer.java:80) at com.thinkaurelius.titan.diskstorage.util. StaticArrayBuffer.getByte( StaticArrayBuffer.java:156) at com.thinkaurelius.titan.diskstorage.util. ReadArrayBuffer.getByte( ReadArrayBuffer.java:67) I am using below configuration to load the graph.ids.block-size=100000storage.cassandra.keyspace=lgpgels storage.backend=cassandrastorage.hostname=lgp1,lgp2,lgp3,lgp4 index.gelssearch.hostname=lgp1,lgp2,lgp3,lgp4 index.gelssearch.backend=elasticsearch index.gelssearch.index-name=lgpgels index.gelssearch.elasticsearch.cluster-name=lgp
ankit tyagi <ankitty...@...>
Hi,
Currently, I am using titan graph0.5.4 for making the social graph out of user activities like (user1-->follows-->user2, user1-->likes-->object2 etc.).
Earlier when I started using it, a number of activities, users etc. were quite small and since then it is growing daily. So to give approximate numbers I have around 20 million user nodes and 50 million objects nodes and obviously billions of edges :).
I have exposed APIs on this social graph using tinker pop library in java eg. get followers, following of user1 etc. I am using Cassandra(version2.0.14) as the underlying database.
Throughput is around 50K rpm.
So, overall loads of deletion, insertion, update, reads are happening on daily basis.
Problems I am facing
If read increases, load gets quite high on my machines (machine config RAID-1 , 1 TB DISK ,32 core, 64GB RAM, 4 similar machines, replication fact 3, Levelled compaction)
Response time/SLA of APIs also degraded over a period of time.
My Queries
Can I use titan for serving this type of real time queries, or should I keep it only for analytics purpose to explore/recommend user data?
Do I need to change my Cassandra configurations or machines configuration to improve performance?
Any suggestions for scaling/serving this type of data?
For more explanation or any other thing, please do ping. I am quite stuck over this problem :/
For using JanusGraph, we need to migrate our current version to 1.0.0. Do we need data migration also because I am getting below exception while loading graph with 1.0.0.
Required size [1] exceeds actual remaining size [0]
Display stack trace? [yN] y
java.lang.ArrayIndexOutOfBoundsException: Required size [1] exceeds actual remaining size [0]
at com.thinkaurelius.titan.diskstorage.util.StaticArrayBuffer.require(StaticArrayBuffer.java:80)
at com.thinkaurelius.titan.diskstorage.util.StaticArrayBuffer.getByte(StaticArrayBuffer.java:156)
at com.thinkaurelius.titan.diskstorage.util.ReadArrayBuffer.getByte(ReadArrayBuffer.java:67)
I am using below configuration to load the graph.
ids.block-size=100000
storage.cassandra.keyspace=lgpgels
storage.backend=cassandra
storage.hostname=lgp1,lgp2,lgp3,lgp4
index.gelssearch.hostname=lgp1,lgp2,lgp3,lgp4
index.gelssearch.backend=elasticsearch
index.gelssearch.index-name=lgpgels
index.gelssearch.elasticsearch.cluster-name=lgp