Date   

Re: Use index for sorting

toom <to...@...>
 

The problem of using custom value for null is that we need to choose a value for each data type, and hope that nobody will try to use this particular value. I suppose it is feasible for data type like string, date or double but not for boolean.

Toom.

On Friday, December 4, 2020 at 2:22:02 AM UTC+1 li...@... wrote:
No, null support is an optional feature for graph providers. JanusGraph does not allow null value and I don’t think it will be supported (in near future).

Apart from the solution suggested by Marc, is it possible for you to come up with some custom value to represent null?

Best regards,
Boxuan

「toom <t...@...>」在 2020年12月4日 週五,上午3:17 寫道:

Hi Marc,

Thank you for your response.
If I understand correctly, with TinkerPop 3.5 I will be able to sort on property with missing values. It is a good news.
Do you know it JanusGraph 0.6.0 will be release with that version ?

Regarding the impact of the step order on index use, I wrote a strategy [1] that put HasStep and OrderStep before FilterStep if they follow a GraphStep.

Best regards,

Toom.

On Thursday, December 3, 2020 at 8:17:27 AM UTC+1 HadoopMarc wrote:
Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:
Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/c6bcb878-bfc2-4d8f-a3ee-38cc214382bcn%40googlegroups.com.


Re: Use index for sorting

"Li, Boxuan" <libo...@...>
 

No, null support is an optional feature for graph providers. JanusGraph does not allow null value and I don’t think it will be supported (in near future).

Apart from the solution suggested by Marc, is it possible for you to come up with some custom value to represent null?

Best regards,
Boxuan

「toom <to...@...>」在 2020年12月4日 週五,上午3:17 寫道:


Hi Marc,

Thank you for your response.
If I understand correctly, with TinkerPop 3.5 I will be able to sort on property with missing values. It is a good news.
Do you know it JanusGraph 0.6.0 will be release with that version ?

Regarding the impact of the step order on index use, I wrote a strategy [1] that put HasStep and OrderStep before FilterStep if they follow a GraphStep.

Best regards,

Toom.

On Thursday, December 3, 2020 at 8:17:27 AM UTC+1 HadoopMarc wrote:
Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:
Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/c6bcb878-bfc2-4d8f-a3ee-38cc214382bcn%40googlegroups.com.


Re: Use index for sorting

toom <to...@...>
 


Hi Marc,

Thank you for your response.
If I understand correctly, with TinkerPop 3.5 I will be able to sort on property with missing values. It is a good news.
Do you know it JanusGraph 0.6.0 will be release with that version ?

Regarding the impact of the step order on index use, I wrote a strategy [1] that put HasStep and OrderStep before FilterStep if they follow a GraphStep.

Best regards,

Toom.

[1] https://gist.github.com/To-om/e7e406d85dcdd65bc2a533960b50a1d2

On Thursday, December 3, 2020 at 8:17:27 AM UTC+1 HadoopMarc wrote:
Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:
Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.


Re: Use index for sorting

HadoopMarc <bi...@...>
 

Hi Toom,

No solution, but the exception that you mention comes from TinkerPop:

Apparently, you want all selected vertices, including the ones with null properties, so I would wait for TinkerPop 3.5 and in the mean time use your own workaround for a single filter criterion and do the ordering outside gremlin for more complex sets of filtering criteria.

Best wishes,      Marc

Op woensdag 2 december 2020 om 08:13:05 UTC+1 schreef t...@...:

Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.


Re: Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM

HadoopMarc <bi...@...>
 

Hi, it does not seem impossible but you should really test it and ask yourself the questions:
  • do your really need gremlin to query your data (instead of SQL)?
  • do you really need Elasticsearch for non-equality matches (read about CompositeIndex and MixedIndex)?
  • for incidental gremlin queries isn't it enough to load the SQL database into a Tinkergraph when needed?
  • what does this Marc really know?
Best wishes,    Marc

Op donderdag 3 december 2020 om 00:58:03 UTC+1 schreef p...@...:

Hi, is possible to Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM?

The system has more reads than writes, we create 10 vertex per hour and the system has a average of 20 concurrents users reading the database. Just an Category/User/Ads manager with only 300k or a little more vertex. And basic traversals.

Can i have some problems or its possible?

How can i determine if its possible? Thank you.


Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM

"p...@pwill.com.br" <pw...@...>
 

Hi, is possible to Run JanusGraph/Cassandra/ElasticSearch in production in a VM with 8GB/RAM?

The system has more reads than writes, we create 10 vertex per hour and the system has a average of 20 concurrents users reading the database. Just an Category/User/Ads manager with only 300k or a little more vertex. And basic traversals.

Can i have some problems or its possible?

How can i determine if its possible? Thank you.


Use index for sorting

toom <to...@...>
 

Hello,

I'm using JanusGraph with Cassandra (0.5.2) and ElasticSearch.

I try to optimize my queries and use the mixed indexes as much as possible, in particular for sortings, but I have some difficulties:

It is not possible to sort by properties that can have missing values (or I get a "The property does not exist as the key has no associated value for the provided element"). Therefore I used ".order().by(coalesce(values('closingDate'), new Date()))" but in this case, the index is not used.

If there is only one sorting criterion, I probably can do something like:

g.inject(1).union(
  g.V().hasLabel('Case').has('closingDate').order().by('closingDate'),
  g.V().hasLabel('Case').hasNot('closingDate'))

But what is my best option if I want to use several criteria?


I also note that the FilterRankingStrategy strategy can have negative effect on performance when there are filters that don't use  index. For example, the following query is faster without step reordering.

g.V().hasLabel('Case').has('closingDate').order().by('closingDate').filter(out('attachment').has('file'))

FilterRanking swaps order() and filter() steps and then index is not used for sorting.

Any help will be much appreciated.

Toom.


Re: throw NullPointerException when query with hasLabel() script

阳生丙 <ouyang....@...>
 

it is reproducible in my environment. and this problem is not appeared when janusgraph server started. 

it is always there after i running into it for the first time, continuing to now

i have another server running, but there is no this probelm.

what informations do you want me to provide? the only log messages while i submit above gremlin script are posted above.


在2020年11月30日星期一 UTC+8 下午6:52:31<HadoopMarc> 写道:

Hi,

Can you please send something going wrong that is reproducible? Below you find an example on the latest janusgraph-full-0.5.2 where hasLabel() works fine.

marc:/tera/lib/janusgraph-full-0.5.2$ bin/janusgraph.sh start
Forking Cassandra...
Running `nodetool statusthrift`..... OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9200)........ OK (connected to 127.0.0.1:9200).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)...... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.

marc:/tera/lib/janusgraph-full-0.5.2$ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
gremlin> g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))
==>graphtraversalsource[emptygraph[empty], standard]
gremlin> g.V().elementMap()
==>[id:4320,label:monster,name:cerberus]
==>[id:8320,label:demigod,name:hercules,age:30]
==>[id:16632,label:god,name:pluto,age:4000]
==>[id:4272,label:titan,name:saturn,age:10000]
==>[id:8368,label:location,name:sea]
==>[id:4200,label:monster,name:nemean]
==>[id:20728,label:monster,name:hydra]
==>[id:24824,label:location,name:tartarus]
==>[id:8440,label:god,name:neptune,age:4500]
==>[id:4224,label:god,name:jupiter,age:5000]
==>[id:12536,label:human,name:alcmene,age:45]
==>[id:4344,label:location,name:sky]
gremlin> g.V().hasLabel('monster').elementMap()
==>[id:4320,label:monster,name:cerberus]
==>[id:4200,label:monster,name:nemean]
==>[id:20728,label:monster,name:hydra]
gremlin> g.V().hasLabel('monster').has('name', 'cerberus').elementMap()
==>[id:4320,label:monster,name:cerberus]
gremlin>


Best wishes,    Marc

Op maandag 30 november 2020 om 08:52:34 UTC+1 schreef ouy...@...:
to HadoopMarc:

it is not about double quote. the problem is: when gremlin script contains hasLabel() clause, janus server will throws NullPointerException.

i use console and gremlin driver to fire these query to remote janusgraph server.



在2020年11月28日星期六 UTC+8 上午2:48:23<HadoopMarc> 写道:
Hi,

Do you mean that the only difference  between the first and the second script is the presence of the closing double quote? I see the stacktrace is from Gremlin Server: did you use Gremlin Console to fire the query? What happens if you make a GLV/bytecode connection instead of a connection for groovy scripts, that is using:

g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))

Best wishes,    Marc

Op vrijdag 27 november 2020 om 19:05:24 UTC+1 schreef ouy...@...:
query with gremlin script:  "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')"
the returned result is expected and correct.

but query with gremlin script: "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')

throws NullPointerException.

the exception stack info is as follow:

2020-11-27 12:07:35,227 WARN  [gremlin-server-exec-10] AbstractEvalOpProcessor.java:311 - Exception processing a script on request [RequestMessage{, requestId=cb1f0ab6-cee6-47c4-bbc1-196f27b1b14c, op='eval', processor='', args={gremlin=g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value'), batchSize=64}}].
java.lang.NullPointerException
        at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:41)
        at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:68)
        at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
        at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:153)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:652)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at com.google.common.collect.Iterators$5.hasNext(Iterators.java:547)
        at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
        at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
        at org.janusgraph.graphdb.util.MultiDistinctOrderedIterator.hasNext(MultiDistinctOrderedIterator.java:75)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:146)
        at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$5(AbstractEvalOpProcessor.java:264)
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:278)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


and the problem is reproducible,  is it a bug?

enviroment:
janusgraph version: 0.5.2
backend cassandra version: 3.11



Re: OLAP, Hadoop, Spark and Cassandra

HadoopMarc <bi...@...>
 

Hi Mladen,

Interesting read! Spark is not very sensitive to the number of tasks. I believe that for OLAP on HadoopGraph the optimum is for partitions of 256 Mb or so. Larger is difficult to hold in memory for reasonably sized executors. Smaller gives too much overhead. OLAP with janusgraph-hbase is much harder, because the partition size is determined by the HBase regions which need to be large (10GB). Also note that the entire graph needs to fit into the total memory of all executors  because graph traversing is shuffle-heavy and spilling to disk will take endlessly.

Best wishes,    Marc

Op maandag 30 november 2020 om 19:09:15 UTC+1 schreef Mladen Marović:

I know I'm quite late to the party, but for future reference - the number of input partitions in Spark depends on the partitioning of the source. In case of cassandra, partitioning is determined by the number of tokens each node gets (as configured by `num_tokens` in `cassandra.yaml`), which is set to 256 by default. So, if you have a 3-node cassandra cluster, by default each node should get 256 tokens, which would result in 3*256 = 768 tokens total. Since Spark reads directly from cassandra (if you're using `org.janusgraph.hadoop.formats.cql.CqlInputFormat`), that translates to 768 partitions in the input Spark RDD, or 768 tasks during processing. Add to that 1 task that collects results, or something similar, and you end up at 769. At least that was my experience.

The default value of 256 for `num_tokens` made sense in older versions, but in cassandra 3.x a new token allocation algorithm was implemented to improve performance for operations requiring token-range scans, which is precisely what Spark does. I experimented a bit with smaller values (e.g. 16) and managed to drastically reduce the number of tasks when scanning the entire graph. For further, reading, I recommend this article.



On Thursday, December 5, 2019 at 9:28:26 AM UTC+1 s...@... wrote:
Answering my own question - turned out I had had a mixup of keyspaces used between the two instances

Default the conf/hadoop-graph/read-cql.properties reads

janusgraphmr.ioformat.conf.storage.cassandra.keyspace

While for CQL it should read

janusgraphmr.ioformat.conf.storage.cql.keyspace

Also - as I made a 'named' (ve_graph) graph I had to point to that one rather than the janusgraph keyspace.

Problem 1 solved. Now to the next - how can I lower the number of 'partitions' Spark is using (here 796  '... on localhost (executor driver) (769/769)')?  

On Wednesday, December 4, 2019 at 11:46:42 PM UTC+1, Sture Lygren wrote:
Hi,

I'm trying to get JanusGraph 0.4.0 with a Cassandra (CQL) backend setup and running as OLAP while still keeping OLTP active in order to do graph updates. I've been searching high and low for some guidance, but so far without any luck. Hopefully someone here could tune in and help?

Here's where I'm at currently

  • local Hadoop running according to https://old-docs.janusgraph.org/0.4.0/hadoop-tp3.html
  • gremlin server started as /bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
  • gremlin-server-configuration.yaml points to init.groovy script doing the traversal mappings for OLTP and OLAP
def globals = [:]
ve = ConfiguredGraphFactory.open("ve_graph")
OLAPGraph = GraphFactory.open('conf/hadoop-graph/read-cql.properties')
globals << [g : ve.traversal(), sg: OLAPGraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)]
  • conf/hadoop-graph/read-cql.properties reads
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
  • Running the gremlin shell I have
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[655848fc-b46e-40be-8174-f0dc42cdabd4]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[655848fc-b46e-40be-8174-f0dc42cdabd4] - type ':remote console' to return to local mode
gremlin> g
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin>
gremlin> sg
==>graphtraversalsource[hadoopgraph[cqlinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().has('lbl','System').count()
==>68
gremlin> sg.V().has('lbl','System').count()
  • The job is running for some time and while finishing the gremlin-server.log reads
253856 [Executor task launch worker for task 768] INFO  org.apache.spark.executor.Executor  - Finished task 768.0 in stage 0.0 (TID 768). 2388 bytes result sent to driver
253858 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSetManager  - Finished task 768.0 in stage 0.0 (TID 768) in 6809 ms on localhost (executor driver) (769/769)
253861 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - ResultStage 0 (fold at SparkStarBarrierInterceptor.java:101) finished in 161.427 s
253861 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSchedulerImpl  - Removed TaskSet 0.0, whose tasks have all completed, from pool
253876 [SparkGraphComputer-boss] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 0 finished: fold at SparkStarBarrierInterceptor.java:101, took 161.598267 s
253888 [SparkGraphComputer-boss] INFO  org.apache.spark.rdd.MapPartitionsRDD  - Removing RDD 1 from persistence list
253901 [block-manager-slave-async-thread-pool-0] INFO  org.apache.spark.storage.BlockManager  - Removing RDD 1
  • However - the count (==> ) reads 0 for the sg traversal
I've most likely missed some crucial point here, but I'm not able to spot it. Please help.



Re: SimplePath query is slower in 6 node vs 3 node Cassandra cluster

Varun Ganesh <operatio...@...>
 

Hi Boxuan,

Thank you for getting back to me. Please find my responses below:

> Did you check the hardware differences? 
Yes I can confirm that the two clusters are identical except for the number of nodes.

> the data involved in your query is probably distributed across nodes
This was our initial guess as well. However, if that was the case, we should technically observe this slowness for all the queries that we try. But it is only observed for "path" queries.

For instance, here's an example of another traversal query where we observe the SAME latency across the 3 and 6 node clusters:
g.V().hasLabel('label_B').has('some_id', 123).has('data.name', 1234567).both('sample_edge').valueMap('data.field1', 'data.field2').next(10)

> Then there would be fewer round-trips happening within the 3-node cluster
I also want to point out that we are not running the Janusgraph in embedded mode (where it is colocated with Cassandra), instead it is running separately on its own server nodes

> Of course with larger cluster you can achieve higher throughput
Interestingly we are not observing any difference in the throughput (i.e. the maximum queries per second that can be handled without seeing timeouts) between the two clusters

Would appreciate any input on where/how we could possibly investigate further.

Thank you!
Varun

On Thursday, November 26, 2020 at 11:19:32 AM UTC-5 li...@... wrote:
Hi,

> why the query is 3x slower on 6 nodes

Did you check the hardware differences? Probably the 6-node cluster has slower network, less memory, slower disk, etc.
Another possibility that I can think of is, the data involved in your query is probably distributed across nodes. Since your 3-node cassandra cluster has 3x replication factor, I would presume all data you have is available on every node. Then there would be fewer round-trips happening within the 3-node cluster.
Generally it makes sense to me that the latency of a small cluster is shorter than that of a large cluster, as long as both clusters are not fully loaded. Of course with larger cluster you can achieve higher throughput.

> the profile step above shows a time taken of >1000ms

This can be a bug in profiling. If you can provide a minimal example to reproduce, that would be very helpful.

Best regards,
Boxuan


On Nov 25, 2020, at 6:04 AM, Varun Ganesh <oper...@...> wrote:

Just an additional note,  you may have noticed that the profile step above shows a time taken of >1000ms. I do not know why this is the case.

When run on the console without profile, it reflects the true time taken:
 gremlin> clockWithResult(10) { graph.tx().rollback(); g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).next() }
 ==>130.9545608

Thanks!

On Tuesday, November 24, 2020 at 4:35:22 PM UTC-5 Varun Ganesh wrote:
Hello,

I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.

I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)

I am running the below query on both of them:

g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()

The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.

I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.

g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(label_A), o...                     1           1           4.582     0.39
    \_condition=(~label = label_A AND some_id = 123 AND data.name = value1)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]@8000
    \_index=someVertexByNameComposite
  optimization                                                                                 0.028
  optimization                                                                                 0.907
  backend-query                                                        1                       3.012
    \_query=someVertexByNameComposite:multiKSQ[1]@8000
    \_limit=8000
RepeatStep([JanusGraphVertexStep(BOTH,[...                     2           2        1167.493    99.45
  HasStep([data.name.eq(...                                                          803.247
  JanusGraphVertexStep(BOTH,[...                           12934       12934         334.095
    \_condition=type[sample_edge]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
    \_multi=true
    \_vertices=264
    optimization                                                                               0.073
    backend-query                                                    266                       5.640
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
    optimization                                                                               0.028
    backend-query                                                  12689                     312.544
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
  PathFilterStep(simple)                                           12441       12441          10.980
  JanusGraphMultiQueryStep(RepeatEndStep)                           1187        1187          11.825
  RepeatEndStep                                                        2           2         810.468
RangeGlobalStep(0,1)                                                   1           1           0.419     0.04
PathStep([value(data.name)])                                 1           1           1.474     0.13
                                            >TOTAL                     -           -        1173.969        -

I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.

I realise that you may require more context. Happy to provide more information as required!

 Thank you!
 

-- 
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6d2483f7-062a-4a95-98b2-6b4aafa87cd3n%40googlegroups.com.


Re: OLAP, Hadoop, Spark and Cassandra

Mladen Marović <mladen...@...>
 

I know I'm quite late to the party, but for future reference - the number of input partitions in Spark depends on the partitioning of the source. In case of cassandra, partitioning is determined by the number of tokens each node gets (as configured by `num_tokens` in `cassandra.yaml`), which is set to 256 by default. So, if you have a 3-node cassandra cluster, by default each node should get 256 tokens, which would result in 3*256 = 768 tokens total. Since Spark reads directly from cassandra (if you're using `org.janusgraph.hadoop.formats.cql.CqlInputFormat`), that translates to 768 partitions in the input Spark RDD, or 768 tasks during processing. Add to that 1 task that collects results, or something similar, and you end up at 769. At least that was my experience.

The default value of 256 for `num_tokens` made sense in older versions, but in cassandra 3.x a new token allocation algorithm was implemented to improve performance for operations requiring token-range scans, which is precisely what Spark does. I experimented a bit with smaller values (e.g. 16) and managed to drastically reduce the number of tasks when scanning the entire graph. For further, reading, I recommend this article.



On Thursday, December 5, 2019 at 9:28:26 AM UTC+1 s...@... wrote:
Answering my own question - turned out I had had a mixup of keyspaces used between the two instances

Default the conf/hadoop-graph/read-cql.properties reads

janusgraphmr.ioformat.conf.storage.cassandra.keyspace

While for CQL it should read

janusgraphmr.ioformat.conf.storage.cql.keyspace

Also - as I made a 'named' (ve_graph) graph I had to point to that one rather than the janusgraph keyspace.

Problem 1 solved. Now to the next - how can I lower the number of 'partitions' Spark is using (here 796  '... on localhost (executor driver) (769/769)')?  

On Wednesday, December 4, 2019 at 11:46:42 PM UTC+1, Sture Lygren wrote:
Hi,

I'm trying to get JanusGraph 0.4.0 with a Cassandra (CQL) backend setup and running as OLAP while still keeping OLTP active in order to do graph updates. I've been searching high and low for some guidance, but so far without any luck. Hopefully someone here could tune in and help?

Here's where I'm at currently

  • local Hadoop running according to https://old-docs.janusgraph.org/0.4.0/hadoop-tp3.html
  • gremlin server started as /bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
  • gremlin-server-configuration.yaml points to init.groovy script doing the traversal mappings for OLTP and OLAP
def globals = [:]
ve = ConfiguredGraphFactory.open("ve_graph")
OLAPGraph = GraphFactory.open('conf/hadoop-graph/read-cql.properties')
globals << [g : ve.traversal(), sg: OLAPGraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)]
  • conf/hadoop-graph/read-cql.properties reads
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
  • Running the gremlin shell I have
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/sture/Scripts/janusgraph-0.4.0-hadoop2/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[655848fc-b46e-40be-8174-f0dc42cdabd4]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[655848fc-b46e-40be-8174-f0dc42cdabd4] - type ':remote console' to return to local mode
gremlin> g
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin>
gremlin> sg
==>graphtraversalsource[hadoopgraph[cqlinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().has('lbl','System').count()
==>68
gremlin> sg.V().has('lbl','System').count()
  • The job is running for some time and while finishing the gremlin-server.log reads
253856 [Executor task launch worker for task 768] INFO  org.apache.spark.executor.Executor  - Finished task 768.0 in stage 0.0 (TID 768). 2388 bytes result sent to driver
253858 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSetManager  - Finished task 768.0 in stage 0.0 (TID 768) in 6809 ms on localhost (executor driver) (769/769)
253861 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - ResultStage 0 (fold at SparkStarBarrierInterceptor.java:101) finished in 161.427 s
253861 [task-result-getter-1] INFO  org.apache.spark.scheduler.TaskSchedulerImpl  - Removed TaskSet 0.0, whose tasks have all completed, from pool
253876 [SparkGraphComputer-boss] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 0 finished: fold at SparkStarBarrierInterceptor.java:101, took 161.598267 s
253888 [SparkGraphComputer-boss] INFO  org.apache.spark.rdd.MapPartitionsRDD  - Removing RDD 1 from persistence list
253901 [block-manager-slave-async-thread-pool-0] INFO  org.apache.spark.storage.BlockManager  - Removing RDD 1
  • However - the count (==> ) reads 0 for the sg traversal
I've most likely missed some crucial point here, but I'm not able to spot it. Please help.



HBase and Solr Standalone index not work

sefty nindyastuti <sefty...@...>
 

hello, I want to ask how to be able to index solr and hbase storage, because I tried that the data could only enter hbase but could not index data to solr, I used hbase version 2.1.0, zookeper version 3.4.12, spark version 2.3 .2 and solr version 7.0.0,  please help to solve the problem.

following the janusgraph-hbase-solr.properties  

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=X.X.X.X, X.X.X.X, X.X.X.X
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://X.X.X.X:8886/solr

gremlin-server.yaml config as shown below  : 

host: 0.0.0.0
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
graphs: { graph:
    conf/janusgraph-hbase-solr.properties}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
     - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
       - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
         - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
           - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
             - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
               - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
               processors:
                 - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
                   - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
                   metrics: {
                     consoleReporter: {enabled: true, interval: 180000},
                       csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
                         jmxReporter: {enabled: true},
                           slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
                               graphiteReporter: {enabled: false, interval: 180000}}
                               maxInitialLineLength: 4096
                               maxHeaderSize: 8192
                               maxChunkSize: 8192
                               maxContentLength: 65536
                               maxAccumulationBufferComponents: 1024
                               resultIterationBatchSize: 64
                               writeBufferLowWaterMark: 32768
                               writeBufferHighWaterMark: 65536


gremlin-server-configuration.yaml config as shown below  : 

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/JanusGraph-configurationmanagement.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: []}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536


config JanusGraph-configurationmanagement.properties as below :
 
gremlin.graph=org.janusgraph.core.ConfiguredGraphFactory
graph.graphname=testgraph
storage.backend=hbase
graph.graphname=ConfigurationManagementGraph
storage.hostname=X.X.X.X, X.X.X.X, X.X.X.X
index.search.backend=solr
index.search.solr.mode=http
#CORE_NAME=search
index.search.solr.http-urls=http://X.X.X.X:8886/solr
#index.search.solr.zookeeper-url=X.X.X.X:2181


Re: throw NullPointerException when query with hasLabel() script

HadoopMarc <bi...@...>
 

Hi,

Can you please send something going wrong that is reproducible? Below you find an example on the latest janusgraph-full-0.5.2 where hasLabel() works fine.

marc:/tera/lib/janusgraph-full-0.5.2$ bin/janusgraph.sh start
Forking Cassandra...
Running `nodetool statusthrift`..... OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9200)........ OK (connected to 127.0.0.1:9200).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)...... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.

marc:/tera/lib/janusgraph-full-0.5.2$ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
gremlin> g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))
==>graphtraversalsource[emptygraph[empty], standard]
gremlin> g.V().elementMap()
==>[id:4320,label:monster,name:cerberus]
==>[id:8320,label:demigod,name:hercules,age:30]
==>[id:16632,label:god,name:pluto,age:4000]
==>[id:4272,label:titan,name:saturn,age:10000]
==>[id:8368,label:location,name:sea]
==>[id:4200,label:monster,name:nemean]
==>[id:20728,label:monster,name:hydra]
==>[id:24824,label:location,name:tartarus]
==>[id:8440,label:god,name:neptune,age:4500]
==>[id:4224,label:god,name:jupiter,age:5000]
==>[id:12536,label:human,name:alcmene,age:45]
==>[id:4344,label:location,name:sky]
gremlin> g.V().hasLabel('monster').elementMap()
==>[id:4320,label:monster,name:cerberus]
==>[id:4200,label:monster,name:nemean]
==>[id:20728,label:monster,name:hydra]
gremlin> g.V().hasLabel('monster').has('name', 'cerberus').elementMap()
==>[id:4320,label:monster,name:cerberus]
gremlin>


Best wishes,    Marc

Op maandag 30 november 2020 om 08:52:34 UTC+1 schreef ouy...@...:

to HadoopMarc:

it is not about double quote. the problem is: when gremlin script contains hasLabel() clause, janus server will throws NullPointerException.

i use console and gremlin driver to fire these query to remote janusgraph server.



在2020年11月28日星期六 UTC+8 上午2:48:23<HadoopMarc> 写道:
Hi,

Do you mean that the only difference  between the first and the second script is the presence of the closing double quote? I see the stacktrace is from Gremlin Server: did you use Gremlin Console to fire the query? What happens if you make a GLV/bytecode connection instead of a connection for groovy scripts, that is using:

g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))

Best wishes,    Marc

Op vrijdag 27 november 2020 om 19:05:24 UTC+1 schreef ouy...@...:
query with gremlin script:  "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')"
the returned result is expected and correct.

but query with gremlin script: "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')

throws NullPointerException.

the exception stack info is as follow:

2020-11-27 12:07:35,227 WARN  [gremlin-server-exec-10] AbstractEvalOpProcessor.java:311 - Exception processing a script on request [RequestMessage{, requestId=cb1f0ab6-cee6-47c4-bbc1-196f27b1b14c, op='eval', processor='', args={gremlin=g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value'), batchSize=64}}].
java.lang.NullPointerException
        at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:41)
        at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:68)
        at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
        at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:153)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:652)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at com.google.common.collect.Iterators$5.hasNext(Iterators.java:547)
        at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
        at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
        at org.janusgraph.graphdb.util.MultiDistinctOrderedIterator.hasNext(MultiDistinctOrderedIterator.java:75)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:146)
        at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$5(AbstractEvalOpProcessor.java:264)
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:278)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


and the problem is reproducible,  is it a bug?

enviroment:
janusgraph version: 0.5.2
backend cassandra version: 3.11



Re: throw NullPointerException when query with hasLabel() script

阳生丙 <ouyang....@...>
 

to HadoopMarc:

it is not about double quote. the problem is: when gremlin script contains hasLabel() clause, janus server will throws NullPointerException.

i use console and gremlin driver to fire these query to remote janusgraph server.



在2020年11月28日星期六 UTC+8 上午2:48:23<HadoopMarc> 写道:

Hi,

Do you mean that the only difference  between the first and the second script is the presence of the closing double quote? I see the stacktrace is from Gremlin Server: did you use Gremlin Console to fire the query? What happens if you make a GLV/bytecode connection instead of a connection for groovy scripts, that is using:

g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))

Best wishes,    Marc

Op vrijdag 27 november 2020 om 19:05:24 UTC+1 schreef ouy...@...:
query with gremlin script:  "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')"
the returned result is expected and correct.

but query with gremlin script: "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')

throws NullPointerException.

the exception stack info is as follow:

2020-11-27 12:07:35,227 WARN  [gremlin-server-exec-10] AbstractEvalOpProcessor.java:311 - Exception processing a script on request [RequestMessage{, requestId=cb1f0ab6-cee6-47c4-bbc1-196f27b1b14c, op='eval', processor='', args={gremlin=g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value'), batchSize=64}}].
java.lang.NullPointerException
        at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:41)
        at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:68)
        at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
        at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:153)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:652)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at com.google.common.collect.Iterators$5.hasNext(Iterators.java:547)
        at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
        at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
        at org.janusgraph.graphdb.util.MultiDistinctOrderedIterator.hasNext(MultiDistinctOrderedIterator.java:75)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:146)
        at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$5(AbstractEvalOpProcessor.java:264)
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:278)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


and the problem is reproducible,  is it a bug?

enviroment:
janusgraph version: 0.5.2
backend cassandra version: 3.11



Re: Configuring Transaction Log feature

Pawan Shriwas <shriwa...@...>
 

one correction to last post in below line.

    JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("TestLog").start();



On Saturday, 28 November 2020 at 18:16:09 UTC+5:30 Pawan Shriwas wrote:
Hi Sandeep,

Please see below java code and properties information which I am trying in local with Cassandra cql as backend.  This code is not giving me the change log as event which I can get via gremlin console with same script and properties. Please let me know if anything needs to be modify here with code or properties.

<!-- Java Code -->
package com.example.graph;

import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphTransaction;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.log.ChangeProcessor;
import org.janusgraph.core.log.ChangeState;
import org.janusgraph.core.log.LogProcessorFramework;
import org.janusgraph.core.log.TransactionId;

public class TestLog {
public static void listenLogsEvent(){
JanusGraph graph = JanusGraphFactory.open("/home/ist/Downloads/IM/jgraphdb_local.properties");
LogProcessorFramework logProcessor = JanusGraphFactory.openTransactionLog(graph);

logProcessor.addLogProcessor("TestLog").
    setProcessorIdentifier("TestLogCounter").
    setStartTimeNow().
    addProcessor(new ChangeProcessor(){
        @Override
        public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
        System.out.println("tx--"+tx.toString());
        System.out.println("txId--"+txId.toString());
        System.out.println("changeState--"+changeState.toString());
       }
    }).
    build();
for(int i=0;i<=10;i++) {
        System.out.println("going to add ="+i);
    JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("PawanTestLog").start();
    JanusGraphVertex a = tx.addVertex("TimeL");
    a.property("type", "HOLD");
    a.property("serialNo", "XS31B4");
    tx.commit();
        System.out.println("Vertex committed ="+a.toString());
}
}
public static void main(String[] args) {
System.out.println("starting main");
listenLogsEvent();
}
}

<!----- graph properties------->
gremlin.graph=org.janusgraph.core.JanusGraphFactory
graph.name=TestGraph
storage.backend = cql
storage.hostname = localhost
storage.cql.keyspace=janusgraphcql
query.fast-property = true
storage.lock.wait-time=10000
storage.batch-loading=true

Thanks in advance.

Thanks,
Pawan


On Saturday, 28 November 2020 at 16:19:20 UTC+5:30 sa...@... wrote:
Pawan,
Can you elaborate more on the program where your are trying to embed the script in?
Regards,
Sandeep

On Sat, 28 Nov 2020, 13:48 Pawan Shriwas, <shr...@...> wrote:
Hey Jason,

Same thing happen with my as well where above script work well in gremlin console  but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.  


On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend.

The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution.

import java.util.concurrent.atomic.*;
import org.janusgraph.core.log.*;
import java.util.concurrent.*;

graph
= JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');

totalHumansAdded
= new AtomicInteger(0);
totalGodsAdded
= new AtomicInteger(0);
logProcessor
= JanusGraphFactory.openTransactionLog(graph);
logProcessor
.addLogProcessor("addedPerson").
        setProcessorIdentifier
("addedPersonCounter").
        setStartTime
(Instant.now()).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("human")) totalHumansAdded.incrementAndGet();
                   
System.out.println("total humans = " + totalHumansAdded);
               
}
           
}
       
}).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("god")) totalGodsAdded.incrementAndGet();
                   
System.out.println("total gods = " + totalGodsAdded);
               
}
           
}
       
}).
        build
()

tx
= graph.buildTransaction().logIdentifier("addedPerson").start();
u
= tx.addVertex(T.label, "human");
u
.property("name", "proteros");
u
.property("age", 36);
tx
.commit();

If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson".

Did you have some example code of what you are attempting?



On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote:
Hi Guys,

We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at
public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {

Janusgraph documentation says value for log.[X].backend is 'default'.
Not sure what exactly it means. does it mean HBase which is being used as backend for data.

Please let  me know, if anyone has configured it.

Thanks and Regards,
Sandeep Mishra

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/JN4ZsB9_DMM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgr...@....


Re: Configuring Transaction Log feature

Pawan Shriwas <shriwa...@...>
 

Hi Sandeep,

Please see below java code and properties information which I am trying in local with Cassandra cql as backend.  This code is not giving me the change log as event which I can get via gremlin console with same script and properties. Please let me know if anything needs to be modify here with code or properties.

<!-- Java Code -->
package com.example.graph;

import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphTransaction;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.log.ChangeProcessor;
import org.janusgraph.core.log.ChangeState;
import org.janusgraph.core.log.LogProcessorFramework;
import org.janusgraph.core.log.TransactionId;

public class TestLog {
public static void listenLogsEvent(){
JanusGraph graph = JanusGraphFactory.open("/home/ist/Downloads/IM/jgraphdb_local.properties");
LogProcessorFramework logProcessor = JanusGraphFactory.openTransactionLog(graph);

logProcessor.addLogProcessor("TestLog").
    setProcessorIdentifier("TestLogCounter").
    setStartTimeNow().
    addProcessor(new ChangeProcessor(){
        @Override
        public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
        System.out.println("tx--"+tx.toString());
        System.out.println("txId--"+txId.toString());
        System.out.println("changeState--"+changeState.toString());
       }
    }).
    build();
for(int i=0;i<=10;i++) {
        System.out.println("going to add ="+i);
    JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("PawanTestLog").start();
    JanusGraphVertex a = tx.addVertex("TimeL");
    a.property("type", "HOLD");
    a.property("serialNo", "XS31B4");
    tx.commit();
        System.out.println("Vertex committed ="+a.toString());
}
}
public static void main(String[] args) {
System.out.println("starting main");
listenLogsEvent();
}
}

<!----- graph properties------->
gremlin.graph=org.janusgraph.core.JanusGraphFactory
graph.name=TestGraph
storage.backend = cql
storage.hostname = localhost
storage.cql.keyspace=janusgraphcql
query.fast-property = true
storage.lock.wait-time=10000
storage.batch-loading=true

Thanks in advance.

Thanks,
Pawan


On Saturday, 28 November 2020 at 16:19:20 UTC+5:30 sa...@... wrote:
Pawan,
Can you elaborate more on the program where your are trying to embed the script in?
Regards,
Sandeep

On Sat, 28 Nov 2020, 13:48 Pawan Shriwas, <shr...@...> wrote:
Hey Jason,

Same thing happen with my as well where above script work well in gremlin console  but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.  


On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend.

The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution.

import java.util.concurrent.atomic.*;
import org.janusgraph.core.log.*;
import java.util.concurrent.*;

graph
= JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');

totalHumansAdded
= new AtomicInteger(0);
totalGodsAdded
= new AtomicInteger(0);
logProcessor
= JanusGraphFactory.openTransactionLog(graph);
logProcessor
.addLogProcessor("addedPerson").
        setProcessorIdentifier
("addedPersonCounter").
        setStartTime
(Instant.now()).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("human")) totalHumansAdded.incrementAndGet();
                   
System.out.println("total humans = " + totalHumansAdded);
               
}
           
}
       
}).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("god")) totalGodsAdded.incrementAndGet();
                   
System.out.println("total gods = " + totalGodsAdded);
               
}
           
}
       
}).
        build
()

tx
= graph.buildTransaction().logIdentifier("addedPerson").start();
u
= tx.addVertex(T.label, "human");
u
.property("name", "proteros");
u
.property("age", 36);
tx
.commit();

If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson".

Did you have some example code of what you are attempting?



On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote:
Hi Guys,

We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at
public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {

Janusgraph documentation says value for log.[X].backend is 'default'.
Not sure what exactly it means. does it mean HBase which is being used as backend for data.

Please let  me know, if anyone has configured it.

Thanks and Regards,
Sandeep Mishra

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/JN4ZsB9_DMM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgr...@....


Re: Configuring Transaction Log feature

Sandeep Mishra <sandy...@...>
 

Pawan,
Can you elaborate more on the program where your are trying to embed the script in?
Regards,
Sandeep

On Sat, 28 Nov 2020, 13:48 Pawan Shriwas, <shriwa...@...> wrote:
Hey Jason,

Same thing happen with my as well where above script work well in gremlin console  but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.  


On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend.

The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution.

import java.util.concurrent.atomic.*;
import org.janusgraph.core.log.*;
import java.util.concurrent.*;

graph
= JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');

totalHumansAdded
= new AtomicInteger(0);
totalGodsAdded
= new AtomicInteger(0);
logProcessor
= JanusGraphFactory.openTransactionLog(graph);
logProcessor
.addLogProcessor("addedPerson").
        setProcessorIdentifier
("addedPersonCounter").
        setStartTime
(Instant.now()).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("human")) totalHumansAdded.incrementAndGet();
                   
System.out.println("total humans = " + totalHumansAdded);
               
}
           
}
       
}).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("god")) totalGodsAdded.incrementAndGet();
                   
System.out.println("total gods = " + totalGodsAdded);
               
}
           
}
       
}).
        build
()

tx
= graph.buildTransaction().logIdentifier("addedPerson").start();
u
= tx.addVertex(T.label, "human");
u
.property("name", "proteros");
u
.property("age", 36);
tx
.commit();

If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson".

Did you have some example code of what you are attempting?



On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote:
Hi Guys,

We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at
public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {

Janusgraph documentation says value for log.[X].backend is 'default'.
Not sure what exactly it means. does it mean HBase which is being used as backend for data.

Please let  me know, if anyone has configured it.

Thanks and Regards,
Sandeep Mishra

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/JN4ZsB9_DMM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/c4494822-6d82-4ba0-b769-bc49a4f80835n%40googlegroups.com.


Re: Configuring Transaction Log feature

Sandeep Mishra <sandy...@...>
 

Never worked with cassandra. Perhaps a base64 encoding is used for storing. 


On Wed, 29 Jul 2020, 00:25 anj...@..., <anjani...@...> wrote:
Thanks Sandeep, yes it works as you mentioned. 
We are using Cassandra as back-end and log tabled are created in it. Data in it are stored as blob type.

I was trying to read blob data type from Cassandra but getting below error 
"InvalidRequest: Error from server: code=2200 [Invalid query] message="In call to function system.blobastext, value 0xc00000000000000000f38503 is not a valid binary representation for type text"

My query : select blobastext(column1) from ulog_test;

How can we read data stored in ulog tables in Cassandra?

Thanks,
Anjani



On Saturday, 11 July 2020 at 13:49:02 UTC+5:30 sa...@... wrote:
Hi,

If you are using same identifier to start the logProcessor, there is no need to explicitly set previous time.
logProcessor keeps a marker of last record read. It should be able to recover from that point.

Do check again.

Regards,
Sandeep

On Thursday, July 9, 2020 at 9:25:17 PM UTC+8, anj...@... wrote:
Hi All,

We are using Janus graph with Cassandra. I am able to capture event using logProcessor and can see table created in Cassandra.

Was trying to figure out, if for some reason logProcessor stops then how to get changes which was done after logProcessor was stopped? 
I tried to start logProcessor by passing previous time thinking it will give all events which were done after that but it does not gave previous changes.


Thanks,
Anjani



On Sunday, 25 February 2018 at 16:01:06 UTC+5:30 sa...@... wrote:
Yeah Jason. I never bothered to look in Janusgraph table, expecting a new table to be created.
I can find a new column family in my setup too.

Thanks and Regards,
Sandeep


On Wednesday, February 21, 2018 at 12:09:14 AM UTC+8, Jason Plurad wrote:
I suppose it could be just confusion on the terminology:

Cassandra -> Keyspace -> Table
HBase -> Table -> Column Family

On Tuesday, February 20, 2018 at 11:05:10 AM UTC-5, Jason Plurad wrote:
Not sure what else to tell you. I just tried the same script from before against HBase 1.3.1, and it created the column family 'ulog_addedPerson' right after the logProcessor.addLogProcessor("addedPerson")...build()command was issued.

hbase(main):001:0> describe 'janusgraph'
Table janusgraph is ENABLED                                                                                                                          
janusgraph                                                                                                                                            
COLUMN FAMILIES DESCRIPTION                                                                                                                          
{NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 'h', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '60480
0 SECONDS (7 DAYS)'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                  
{NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 't', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER'
, COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                  
{NAME => 'ulog_addedPerson', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE'
, TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                  
10 row(s) in 0.0230 seconds




On Tuesday, February 20, 2018 at 9:46:26 AM UTC-5, Sandeep Mishra wrote:
Hi Jason,

I tried it with Hbase backend, and I am getting control passed to change processor. 
Appreciate your help, on careful checking I notice that mutation was happening under default transaction initiated by Janusgraph, hence the issue.

However, the problem right now, I am unable to locate a table for data.
I have taken a snapshot of table in hbase using HBase shell before and after processing, but there is no new table created. 
Any idea what could be wrong? Is there as possibility that, its saving log data in janusgraph table meant for saving actual data?

Thanks and Regards,
Sandeep



On Sunday, February 18, 2018 at 11:56:49 PM UTC+8, Sandeep Mishra wrote:
Both groovy and java code works with backend as berkeleyje. Tomorrow in office i will try with Hbase as backend. 
Noted on your point.

Thanks and Regards,
Sandeep

On Sunday, February 18, 2018 at 11:14:48 PM UTC+8, Jason Plurad wrote:
You can use the same exact code in a simple Java program and prove that it works.
I'd think the main thing to watch out for is that your mutations are on a transaction that have the log identifier on it.
Is the Gremlin Server involved in your scenario?

tx = graph.buildTransaction().logIdentifier("addedPerson").start();


On Sunday, February 18, 2018 at 1:00:08 AM UTC-5, Sandeep Mishra wrote:
Hi Jason,

Thanks for a prompt reply.
Sample code attached below works well when executed from Gremlin console.
However, execution of Java version doesn't trigger callback. Probably something wrong with my code.
Unfortunately I can't copy code from my office machine.
I will check it again and keep you posted.

Regards,
Sandeep 

On Wednesday, February 7, 2018 at 10:58:41 PM UTC+8, Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend.

The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution.

import java.util.concurrent.atomic.*;
import org.janusgraph.core.log.*;
import java.util.concurrent.*;

graph
= JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');

totalHumansAdded
= new AtomicInteger(0);
totalGodsAdded
= new AtomicInteger(0);
logProcessor
= JanusGraphFactory.openTransactionLog(graph);
logProcessor
.addLogProcessor("addedPerson").
        setProcessorIdentifier
("addedPersonCounter").
        setStartTime
(Instant.now()).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("human")) totalHumansAdded.incrementAndGet();
                   
System.out.println("total humans = " + totalHumansAdded);
               
}
           
}
       
}).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("god")) totalGodsAdded.incrementAndGet();
                   
System.out.println("total gods = " + totalGodsAdded);
               
}
           
}
       
}).
        build
()

tx
= graph.buildTransaction().logIdentifier("addedPerson").start();
u
= tx.addVertex(T.label, "human");
u
.property("name", "proteros");
u
.property("age", 36);
tx
.commit();

If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson".

Did you have some example code of what you are attempting?


On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote:
Hi Guys,

We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at
public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {

Janusgraph documentation says value for log.[X].backend is 'default'.
Not sure what exactly it means. does it mean HBase which is being used as backend for data.

Please let  me know, if anyone has configured it.

Thanks and Regards,
Sandeep Mishra

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/JN4ZsB9_DMM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/74e66713-d07c-4cca-a7f8-7faf225fc3c9n%40googlegroups.com.


Re: Configuring Transaction Log feature

Pawan Shriwas <shriwa...@...>
 

Hey Jason,

Same thing happen with my as well where above script work well in gremlin console  but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.  


On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend.

The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution.

import java.util.concurrent.atomic.*;
import org.janusgraph.core.log.*;
import java.util.concurrent.*;

graph
= JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');

totalHumansAdded
= new AtomicInteger(0);
totalGodsAdded
= new AtomicInteger(0);
logProcessor
= JanusGraphFactory.openTransactionLog(graph);
logProcessor
.addLogProcessor("addedPerson").
        setProcessorIdentifier
("addedPersonCounter").
        setStartTime
(Instant.now()).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("human")) totalHumansAdded.incrementAndGet();
                   
System.out.println("total humans = " + totalHumansAdded);
               
}
           
}
       
}).
        addProcessor
(new ChangeProcessor() {
           
public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) {
               
for (v in changeState.getVertices(Change.ADDED)) {
                   
if (v.label().equals("god")) totalGodsAdded.incrementAndGet();
                   
System.out.println("total gods = " + totalGodsAdded);
               
}
           
}
       
}).
        build
()

tx
= graph.buildTransaction().logIdentifier("addedPerson").start();
u
= tx.addVertex(T.label, "human");
u
.property("name", "proteros");
u
.property("age", 36);
tx
.commit();

If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson".

Did you have some example code of what you are attempting?



On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote:
Hi Guys,

We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at
public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {

Janusgraph documentation says value for log.[X].backend is 'default'.
Not sure what exactly it means. does it mean HBase which is being used as backend for data.

Please let  me know, if anyone has configured it.

Thanks and Regards,
Sandeep Mishra


Re: throw NullPointerException when query with hasLabel() script

HadoopMarc <bi...@...>
 

Hi,

Do you mean that the only difference  between the first and the second script is the presence of the closing double quote? I see the stacktrace is from Gremlin Server: did you use Gremlin Console to fire the query? What happens if you make a GLV/bytecode connection instead of a connection for groovy scripts, that is using:

g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))

Best wishes,    Marc

Op vrijdag 27 november 2020 om 19:05:24 UTC+1 schreef ouy...@...:

query with gremlin script:  "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')"
the returned result is expected and correct.

but query with gremlin script: "g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value')

throws NullPointerException.

the exception stack info is as follow:

2020-11-27 12:07:35,227 WARN  [gremlin-server-exec-10] AbstractEvalOpProcessor.java:311 - Exception processing a script on request [RequestMessage{, requestId=cb1f0ab6-cee6-47c4-bbc1-196f27b1b14c, op='eval', processor='', args={gremlin=g.E().hasLabel('L2_LINK').has('prop1', 'prop1-value'), batchSize=64}}].
java.lang.NullPointerException
        at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:41)
        at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:68)
        at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
        at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:153)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:652)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
        at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at com.google.common.collect.Iterators$5.hasNext(Iterators.java:547)
        at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
        at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
        at org.janusgraph.graphdb.util.MultiDistinctOrderedIterator.hasNext(MultiDistinctOrderedIterator.java:75)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:149)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197)
        at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:146)
        at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$5(AbstractEvalOpProcessor.java:264)
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:278)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


and the problem is reproducible,  is it a bug?

enviroment:
janusgraph version: 0.5.2
backend cassandra version: 3.11


1301 - 1320 of 6661