Scalability issue with current titan(0.5.4) version and upgrade version to (1.0.0)


ankit tyagi <ankitty...@...>
 

Hi,

Currently, I am using titan graph0.5.4 for making the social graph out of user activities like (user1-->follows-->user2, user1-->likes-->object2  etc.). 

Earlier when I started using it, a number of activities, users etc. were quite small and since then it is growing daily. So to give approximate numbers I have around 20 million user nodes and 50 million objects nodes and obviously billions of edges :). 

I have exposed APIs on this social graph using tinker pop library in java eg. get followers, following of user1 etc. I am using Cassandra(version2.0.14) as the underlying database. 

Throughput is around 50K rpm.

So, overall loads of deletion, insertion, update, reads are happening on daily basis.

Problems I am facing
If read increases, load gets quite high on my machines (machine config RAID-1 , 1 TB DISK ,32 core, 64GB RAM, 4 similar machines, replication fact 3, Levelled compaction)
Response time/SLA of APIs also degraded over a period of time.
My Queries
Can I use titan for serving this type of real-time queries, or should I keep it only for analytics purpose to explore/recommend user data?
Do I need to change my Cassandra configurations or machines configuration to improve performance?
Any suggestions for scaling/serving this type of data?
For more explanation or any other thing, please do ping. I am quite stuck over this problem.

For using Janus graph, we need to migrate our version to 1.0.0. Is this upgrade requires data migration also because I am below exception when I am trying to load data with 1.0.0 version,

Display stack trace? [yN] y
java.lang.ArrayIndexOutOfBoundsException: Required size [1] exceeds actual remaining size [0]
at com.thinkaurelius.titan.diskstorage.util.StaticArrayBuffer.require(StaticArrayBuffer.java:80)
at com.thinkaurelius.titan.diskstorage.util.StaticArrayBuffer.getByte(StaticArrayBuffer.java:156)
at com.thinkaurelius.titan.di skstorage.util.ReadArrayBuffer.getByte(ReadArrayBuffer.java:67)
at com.thinkaurelius.titan.graphdb.database.idhandling.VariableLong.readUnsigned(VariableLong.java:34)
at com.thinkaurelius.titan.graphdb.database.idhandling.VariableLong.readPositive(VariableLong.java:80)


I am using below configuration to load the graph.
     ids.block-size=100000
     storage.cassandra.keyspace=lgpgels
     storage.backend=cassandra  
     storage.hostname=lgp1,lgp2,lgp3,lgp4
     index.gelssearch.hostname=lgp1,lgp2,lgp3,lgp4
     index.gelssearch.backend=elasticsearch
     index.gelssearch.index-name=lgpgels
     index.gelssearch.elasticsearch.cluster-name=lgp