Date   

Disabling Indexing Backend

Chris Ruppelt <chris....@...>
 

When initializing the JanusGraph, is there a way to disable indexing backend? The below code always assumes using elasticsearch.

       Configuration c = new BaseConfiguration();

        

        c.setProperty("gremlin.graph", "org.janusgraph.core.JanusGraphFactory");

        c.setProperty("storage.backend", "cassandrathrift");


        Graph graph = GraphFactory.open(c);


Thanks
Chris R


Re: Janus Graph as replace for NOSQL database in Web Applications for non-bulk randomly generated data

Jane <shabs...@...>
 

So by separating the data into two different databases, traversals on vertex properties would not be possible without external look ups. 

Would there be any specific optimizations that I could possibly do since I would like to also be able to use OLAP part of Janus. 


On Tuesday, June 20, 2017 at 3:29:16 PM UTC-4, Robert Dale wrote:
You could keep your schema-less data in your existing NoSQL db and then add a graph database for storing relations.

Or try a multi-model graph database. There are several listed here http://tinkerpop.apache.org/#graph-systems

There's no one right way to do it.

Robert Dale

On Tue, Jun 20, 2017 at 2:32 PM, Jane <sh...@...> wrote:
Can I/Has anyone use Janus Graph as a replacement for NOSQL databases in web applications? I'm talking about high throughput on both reads and writes (non-bulk, randomly generated by user traffic). A lot of examples that are getting tossed around here are just a bunch of people exporting their data from somewhere else and bulk importing it into Janus Graph for analysis. 

It seems that a blocking issue for me with Janus Graph is that for it to remotely even perform decently it requires an index and schema to be pre-generated for the data set. With my use case it is simply not possible to determine everything that a user might want to possibly upload to the database and hence why we use a NOSQL database. However, the data is also highly relational and would benefit from being stored in a graph database. 

Has anyone gotten over this hurdle or have any advice? 

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to load a CSV file into janusgraph

Elizabeth <hlf...@...>
 

Hi Marc,

Thanks so much for your information, however, I was wondering is there any complete code example about how to use 
"bulk-loading" in Janusgraph without Hadoop?


Thanks again!
Elis

On Thursday, June 15, 2017 at 9:59:10 PM UTC+8, HadoopMarc wrote:
Hi Elizabeth,

For JanusGraph you should also take into account the TinkerPop documentation. A relevant pointer for you is:
https://groups.google.com/forum/#!searchin/gremlin-users/csv%7Csort:relevance/gremlin-users/AetuGcLiBxo/KW966WAyAQAJ

Cheers,    Marc

Op woensdag 14 juni 2017 18:44:16 UTC+2 schreef Elizabeth:
Hi all,

I am new to Janusgraph, I have dived into docs of Janusgraph for almost two weeks, nothing found.
I could only gather the scatted information and most of the time it will prompt some errors.
Could anyone supply a complete example of bulk loading or loading a CSV file into Janusgraph, please?
Any little help is appreated!

Best regards,

Elis.


How can i keep the vertex which i want to add is unique?

huu...@...
 

Hi, all:

       How can i keep the vertex which i want to add is unique?  get and add?  whether have any other methods to add unique vertex ?


When janusgraph can support ES 5.x in the future?

huu...@...
 

Hi, all 

        We want to use janusgraph in our production, but we use hbase 1.2.x and ES 5.3.0 in our system, so I want to know when janusgraph can support ES 5.x in the future ,next version  0.2.0 ?


Re: Janus Graph as replace for NOSQL database in Web Applications for non-bulk randomly generated data

Robert Dale <rob...@...>
 

You could keep your schema-less data in your existing NoSQL db and then add a graph database for storing relations.

Or try a multi-model graph database. There are several listed here http://tinkerpop.apache.org/#graph-systems

There's no one right way to do it.

Robert Dale

On Tue, Jun 20, 2017 at 2:32 PM, Jane <shabs...@...> wrote:
Can I/Has anyone use Janus Graph as a replacement for NOSQL databases in web applications? I'm talking about high throughput on both reads and writes (non-bulk, randomly generated by user traffic). A lot of examples that are getting tossed around here are just a bunch of people exporting their data from somewhere else and bulk importing it into Janus Graph for analysis. 

It seems that a blocking issue for me with Janus Graph is that for it to remotely even perform decently it requires an index and schema to be pre-generated for the data set. With my use case it is simply not possible to determine everything that a user might want to possibly upload to the database and hence why we use a NOSQL database. However, the data is also highly relational and would benefit from being stored in a graph database. 

Has anyone gotten over this hurdle or have any advice? 

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sample project on Janus Graph

Misha Brukman <mbru...@...>
 

Hi Yashpal,

Here's some sample code in Java: https://github.com/pluradj/janusgraph-java-example

Misha

On Mon, Jun 19, 2017 at 6:26 AM, Yashpal Singh <yadhuva...@...> wrote:
Hi All,

I am new to graph DB, So if will be really helpfull, If we have some sample projects available on the git.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Possibility of index out of sync with graph

Austin Sharp <austins...@...>
 

Yes, it can happen with ES and Cassandra in some error handling cases:
https://github.com/JanusGraph/janusgraph/issues/281

Hopefully the new ES implementation that is in master now doesn't have this problem.


On Monday, June 19, 2017 at 7:22:32 AM UTC-7, Adam Holley wrote:
Is it possible for the index (either Elasticsearch or Solr) to be out of sync with the store (cassandra or hbase) or is the store commit contingent on the index commit?  If it is possible, what conditions would cause being out of sync, what can be done to avoid it?
Thanks.
Adam.


Janus Graph as replace for NOSQL database in Web Applications for non-bulk randomly generated data

Jane <shabs...@...>
 

Can I/Has anyone use Janus Graph as a replacement for NOSQL databases in web applications? I'm talking about high throughput on both reads and writes (non-bulk, randomly generated by user traffic). A lot of examples that are getting tossed around here are just a bunch of people exporting their data from somewhere else and bulk importing it into Janus Graph for analysis. 

It seems that a blocking issue for me with Janus Graph is that for it to remotely even perform decently it requires an index and schema to be pre-generated for the data set. With my use case it is simply not possible to determine everything that a user might want to possibly upload to the database and hence why we use a NOSQL database. However, the data is also highly relational and would benefit from being stored in a graph database. 

Has anyone gotten over this hurdle or have any advice? 


creating a vertex with a LIST property in a single gremlin statement

Peter Musial <pmmu...@...>
 

Hi All,

(first entry, so please be patient)

Following is more of a gremlin question.  I have a JG schema with a property called status, cardinality list.

status = mgmt.makePropertyKey('status').dataType(String.class).cardinality(Cardinality.LIST).make();

In documentation it has been suggested to do this:

myVertex = graph.addVertex(label,'myVertex')
myVertex.property('status', 'HI')
myVertex.property('status', 'BYE')

which of course it works as advertised.  However, I found that the following shorthand will also work in JanusGraph 0.1.1 (but not in 0.1.0)

graph.addVertex(label,'myVertex', 'status', 'HI', 'status', 'BYE')

Can someone help explain if this is a supported syntax, or simply some syntactic sugar.  Also, I cannot find any documentation on the key differences between JanusGraph 0.1.0 and 0.1.1 with respect to gremlin.

$ echo $CASSANDRA_HOME
/path/janusgraph/apache-cassandra-2.1.17
$ which janusgraph.sh
/path/janusgraph/janusgraph-0.1.1-hadoop2/bin/janusgraph.sh
$ which gremlin.sh
/path/janusgraph/janusgraph-0.1.1-hadoop2/bin/gremlin.sh

Thank you,

Peter


Re: HBase unbalanced table regions after bulkload

aoz...@...
 

Hi Marco,

Thanks for the interest, 

I think I am not able to use explicit partitioning because as far as I know in order to use explicit partitioning one should set the ids.flush = false. Whenever I do that I get the FastNoSuchElementException and the bulkload fails. If I leave the parameter as true, the bulkload succeeds, i.e. left and right side of my edges are symmetrical.

No research were helpful to understand the issue. So, I am really stuck and still have both loading and query performance problems which are still unresolved.

Any help is appreciated.

Best,
Ali


16 Haziran 2017 Cuma 21:22:09 UTC+3 tarihinde HadoopMarc yazdı:

Hi Ali,

Thanks for returning feedback about your experiments.

The docs have a section on graph partitioning, warning about a too large number of partitions and suggesting to use explicit partitioning if performance counts. They also suggest to take twice the number of region server instances as the region count.

http://docs.janusgraph.org/latest/graph-partitioning.html

HTH,   Marc

Op vrijdag 16 juni 2017 16:42:55 UTC+2 schreef Ali ÖZER:
Hi Marco,

I think it has nothing to do with the region-count and hbase does not ignore any region in any circumstance. Since my regions are imbalaced (only 100 regions have data in them), data size per region is not outputsize/1024 it is outputsize/100.

My output size is not equal to my input size; it is 330GB therefore data size per region is not 130 MB it is 3 GB. Nevertheless as I said, I do not think that the problem is with the region count because I have managed to increase the balance of my regions by doing the followings:

I learned that the whole thing is about cluster.max-partitions parameter. Its default value is 32, I set it to 256 and changed nothing else and re-run the bulkloadervertexprogram and realized that non-empty region count was increased from 100 to 256. (when the parameter was 32; blvp loaded data into only 32 regions however hbase automatically splitted the oversized regions and the number of non-empty regions became 100). Therefore I realized that in order to fill 1024 regions of my 1024 regions I need to set the cluster.max-partitions to 1024. 

However there is one problem, when I increased the value of cluster.max-partitions from 32 to 256; the run-time of my bulkloadervertexprogram increased 5-times. I was able to load the whole data in almost 2 hours; now it is almost 10 hours. I think it is because each spark executor is trying to write 1024 region all at once. And I have 1024 spark executors; this means a lot of network 1024*1024. 

Due to the fact that I do not know the internals of the blvp and janus id assignment; I am not one hundred percent sure about all these.

Is there somebody knows the internals of janus, I would really appreciate that and I am pretty sure that this knowledge will really help me to solve the problem.

Best,
Ali

16 Haziran 2017 Cuma 14:55:00 UTC+3 tarihinde mar...@... yazdı:
Hi Ali,

OK, I overlooked your config line "storage.hbase.region-count=1024". This is far too large a number, since HBase likes regions with a size of the order of 10GB, rather than the 130MB you requested. This 10 GB region split is probably a HBase global setting. It could be a similar number in your HBase cluster, so HBase just ignores the superfluous regions, unless you would configure the region split to a lower number manually (not advised).

Other comments from HBase experts are welcomed (I do not consider myself one).

Cheers,    Marc

Op vrijdag 16 juni 2017 09:57:12 UTC+2 schreef Ali ÖZER:
Hi Marc,

As far as I know, even if yarn schedules executors unevenly, it does not mean that the data written across hbase will be uneven.

The data is written hbase according to the key of the datum and the key-ranges of the regions, it does nothing to do with the node that the writer jvm is working on.

My executors are working on %90 of my nodes (it is not that uneven) however percentage of my empty regions is %90(900 of 1024 regions). If you were right the latter percentage would be %10 instead of %90.

If there is some other mechanism while assigning ids in distributed fashion, may you please keep me updated and elaborate on the mechanism.

Best,
Ali

15 Haziran 2017 Perşembe 22:51:14 UTC+3 tarihinde HadoopMarc yazdı:

Hi Ali,

I have never tried to optimize this myself, but maybe you should also look into the docs at

12.3.30. storage.hbase


...

storage.hbase.region-count


The number of initial regions set when creating JanusGraph’s HBase table

Integer

(no default value)

MASKABLE

storage.hbase.regions-per-server

The number of regions per regionserver to set when creating JanusGraph’s HBase table

Integer

(no default value)

MASKABLE



Normally, HBase does not want many regions, but the number of regions times hdfs replication factor should be at least the number of active datanodes for maximum performance. I think some unbalance is inevitable as yarn will schedule executors unevenly and each executor will try to have local data access.

Further, you can look into HBase's region load balancer configuration, which enables HBase to move regions automatically.

HTH,    Marc

Op donderdag 15 juni 2017 16:04:28 UTC+2 schreef Ali ÖZER:
We are using cloudera 5.7.0 with java 1.8.0_74 and we have spark 1.6.0, janusgraph 0.1.1, hbase 1.2.0.

I managed to bulkload 130GB of data into 1024 region hbase table in 2 hours 30 minute with 1024 spark executors (1-core,20gb memory). Each stage of blvp is configured to run 10240 tasks:
readGraph.compute(SparkGraphComputer).workers(10240).program(blvp).submit().get().

However I am unable to distribute the hbase data evenly across regions, they are pretty imbalanced. I suspect it is related to the conf value of ids.num-partitions.

Here is how I set the conf:

conf/my-janusgraph-hbase.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
storage.hbase.region-count=1024

ids.block-size=2000000
ids.renew-timeout=3600000
storage.buffer-size=10000
ids.num-partitions=1024
ids.partition=true

storage.hbase.table=myjanus
storage.hostname=x.x.x.x
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.5

I even tried setting ids.num-partitions=10240; however the problem was not solved.

Should I still increase the ids.num-partitions value to an even higher value like 102400?
What is the difference cluster.max-partitions and ids.num-partitions. Is my problem related to cluster.max-partitions? Should I use it?

As far as I know ids.num-partitions value determines the number of randomly gathered prefixes that will be used in assigning ids to elements. And I read somewhere that setting ids.num-partitions to 10 times of region count will be enough; however it seems like that is not the case. And I do not want to increase ids.num-partitions further. Since I could not find any document related to the internals of cluster.max-partitions, I am really ignorant about it and need some help.

Thanks in advance,
Best,
Ali





MixedIndex naming convention

Ravikumar Govindarajan <ravikumar....@...>
 

I saw in many places of documentation/tutorials, that the mixed indexes have this

mgmt.buildIndex("vertices", Vertex.class).addKey(key).buildMixedIndex(INDEX_NAME); // With INDEX_NAME ='search', mostly


Will this create one index in ElasticSearch/SOLR for one property or all properties are clubbed under a single index? 


Also does JanusGraph partition these text based indexes, in case they get too large?


--

Ravi




Re: Visual Viewer for Graph...

Ravikumar Govindarajan <ravikumar....@...>
 

Thanks for the info

--
Ravi

On Monday, June 19, 2017 at 8:13:32 PM UTC+5:30, HadoopMarc wrote:
Hi Ravikumar,

TinkerPop provides a quick start for graph visualization with the Gephi plugin for the gremlin console.

Cheers,    Marc

Op maandag 19 juni 2017 15:31:18 UTC+2 schreef Ravikumar Govindarajan:
Is there a visual viewer for JanusGraph?

I am just beginning & would like to visually see the graph I created to understand & analyze few things. Backend cassandra storage is too cryptic & I can't make head or tail out of it!!

Thanks & Regards
Ravi


Re: Visual Viewer for Graph...

Jeremy Hanna <jeremy....@...>
 

There are a few with TinkerPop driver support as well.  Cambridge Intelligence’s Keylines is a javascript library that queries against the graph directly with the TinkerPop Java driver.  I believe Tom Sawyer’s Perspectives also queries against the graph directly and I think they’re now TinkerPop 3 enabled and keeps the result in cache so you can do different things with it.  Linkurious exports the graph (so it needs to be small) but allows for neat interactive visualizations - there’s an application you can use along with an api.  Gephi works for small datasets and has built-in support as Marc said.

On Jun 19, 2017, at 9:43 AM, HadoopMarc <m.c.d...@...> wrote:

Hi Ravikumar,

TinkerPop provides a quick start for graph visualization with the Gephi plugin for the gremlin console.

Cheers,    Marc

Op maandag 19 juni 2017 15:31:18 UTC+2 schreef Ravikumar Govindarajan:
Is there a visual viewer for JanusGraph?

I am just beginning & would like to visually see the graph I created to understand & analyze few things. Backend cassandra storage is too cryptic & I can't make head or tail out of it!!

Thanks & Regards
Ravi

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra/HBase storage backend issues

HadoopMarc <m.c.d...@...>
 

Hi Mike,

Seeing no expert answers uptil now, I can only provide a general reply. I see the following lines of thinking in explaining your situation:
  • HBase fails in providing row based consistency: extremely unlikely given the many applications that rely on this
  • JanusGraph fails in providing consistency between instances (e.g. using out of date caches). Do you use multiple JanusGraph instances? Or multiple threads that access the same JanusGraph instance?
  • Your application fails in handling exceptions in the right way (e.g. ignoring them)
  • Your application has logic faults: not so likely because you have been debugging for some while.
If you want to proceed on this, could you provide the code you use on github? So, others can confirm the behavior and/or inspect configs. Ideally, you would provide your code in the form of a:
 https://github.com/JanusGraph/janusgraph/blob/236dd930a7af35061e393ea8bb1ee6eb65f924b2/janusgraph-hbase-parent/janusgraph-hbase-core/src/test/java/org/janusgraph/graphdb/hbase/HBasePartitionGraphTest.java

Other ideas still welcome!

Marc

Op zondag 18 juni 2017 08:38:02 UTC+2 schreef mi...@...:

Hi! I'm running into an issue and wondering if anyone has tips. I'm using HBase (also tried this with cassandra with the same issue) and running into an issue where preprocessing our data yields inconsistent results. We run through a query and for each vertex with a given property, we run a traversal on it and calculate properties or insert edges that weren't inserted on upload to boost performance of our eventual traversal.

Our tests run perfectly with a tinkergraph, but when using HBase or Cassandra backend, sometimes the tests fail, sometimes the calculated properties are completely wrong, and sometimes edges aren't created when needed. A preprocess task may depend on the output of a previous preprocess task that may have taken place seconds earlier. I think this is caused by eventual consistency breaking the traversal, but I'm not sure how to get 100% accuracy (where the current preprocess task can be 100% confident it gets the correct value from a previous preprocessing task). 

I create a transaction for each preprocessing operation, then commit it once successful, but this doesn't seem to fix the issues. Any ideas?

Thanks,
Mike


Re: Visual Viewer for Graph...

HadoopMarc <m.c.d...@...>
 

Hi Ravikumar,

TinkerPop provides a quick start for graph visualization with the Gephi plugin for the gremlin console.

Cheers,    Marc

Op maandag 19 juni 2017 15:31:18 UTC+2 schreef Ravikumar Govindarajan:

Is there a visual viewer for JanusGraph?

I am just beginning & would like to visually see the graph I created to understand & analyze few things. Backend cassandra storage is too cryptic & I can't make head or tail out of it!!

Thanks & Regards
Ravi


Possibility of index out of sync with graph

Adam Holley <holl...@...>
 

Is it possible for the index (either Elasticsearch or Solr) to be out of sync with the store (cassandra or hbase) or is the store commit contingent on the index commit?  If it is possible, what conditions would cause being out of sync, what can be done to avoid it?
Thanks.
Adam.


Visual Viewer for Graph...

Ravikumar Govindarajan <ravikumar....@...>
 

Is there a visual viewer for JanusGraph?

I am just beginning & would like to visually see the graph I created to understand & analyze few things. Backend cassandra storage is too cryptic & I can't make head or tail out of it!!

Thanks & Regards
Ravi


Sample project on Janus Graph

Yashpal Singh <yadhuva...@...>
 

Hi All,

I am new to graph DB, So if will be really helpfull, If we have some sample projects available on the git.


Cassandra/HBase storage backend issues

mikeo...@...
 

Hi! I'm running into an issue and wondering if anyone has tips. I'm using HBase (also tried this with cassandra with the same issue) and running into an issue where preprocessing our data yields inconsistent results. We run through a query and for each vertex with a given property, we run a traversal on it and calculate properties or insert edges that weren't inserted on upload to boost performance of our eventual traversal.

Our tests run perfectly with a tinkergraph, but when using HBase or Cassandra backend, sometimes the tests fail, sometimes the calculated properties are completely wrong, and sometimes edges aren't created when needed. A preprocess task may depend on the output of a previous preprocess task that may have taken place seconds earlier. I think this is caused by eventual consistency breaking the traversal, but I'm not sure how to get 100% accuracy (where the current preprocess task can be 100% confident it gets the correct value from a previous preprocessing task). 

I create a transaction for each preprocessing operation, then commit it once successful, but this doesn't seem to fix the issues. Any ideas?

Thanks,
Mike

6321 - 6340 of 6663