Date   

Re: Index not being used with 'Between" clause

Gene Fojtik <genef...@...>
 

Outstanding - thank you Jason.

-gene


On Thursday, June 8, 2017 at 11:47:53 PM UTC-5, Jason Plurad wrote:
Make sure you're using a mixed index for numeric range queries. Composite indexes are best for exact matching. The console session below shows the difference:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-lucene.properties')
==>standardjanusgraph[berkeleyje:/usr/lib/janusgraph-0.1.1-hadoop2/conf/../db/berkeley]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@1c8f6a90
gremlin
> lat = mgmt.makePropertyKey('lat').dataType(Integer.class).make()
==>lat
gremlin
> latidx = mgmt.buildIndex('latidx', Vertex.class).addKey(lat).buildCompositeIndex()
==>latidx
gremlin
> lon = mgmt.makePropertyKey('lon').dataType(Integer.class).make()
==>lon
gremlin
> lonidx = mgmt.buildIndex('lonidx', Vertex.class).addKey(lon).buildMixedIndex('search')
==>lonidx
gremlin
> mgmt.commit()
==>null
gremlin
> v = graph.addVertex('code', 'rdu', 'lat', 35, 'lon', -78)
==>v[4184]
gremlin
> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[berkeleyje:/usr/lib/janusgraph-0.1.1-hadoop2/conf/../db/berkeley], standard]
gremlin
> g.V().has('lat', 35)
==>v[4184]
gremlin
> g.V().has('lat', between(34, 36))
00:40:33 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(lat >= 34 AND lat < 36)]. For better performance, use indexes
==>v[4184]
gremlin
> g.V().has('lon', -78)
==>v[4184]
gremlin
> g.V().has('lon', between(-79, -77))
==>v[4184]

-- Jason

On Wednesday, June 7, 2017 at 12:01:31 PM UTC-4, Gene Fojtik wrote:
Hello,

Have an index in a property "latitude", when using with the between clause, the index is not being utilized

g.V().has("latitude", 33.333")  works well, however

g.V().has(“latitude”, between(33.889, 33.954)))  does not use the indexes..

Any assistance would be appreciated..

-g


call queue is full on /0.0.0.0.:60020, too many items queued? hbase

aoz...@...
 

Here is my problem:

We are using cloudera 5.7.0 with java 1.8.0_74 and we have spark 1.6.0, janusgraph 0.1.1, hbase 1.2.0.

I try to load 200Gb of graph data and for that I run the following code in gremlin shell:

:load data/call-janusgraph-schema-groovy
writeGraphPath='conf/my-janusgraph-hbase.properties'
writeGraph=JanusGraphFactory.open(writeGraphPath)
defineCallSchema(writeGraph)
writeGraph.close()

readGraph=GraphFactory.open('conf/hadoop-graph/hadoop-call-script.properties')
gRead=readGraph.traversal()
gRead.V().valueMap()

//so far so good everything works perfectly

blvp=BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).intermediateBatchSize(10000).writeGraph(writeGraphPath).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(512).program(blvp).submit().get()

It starts executing the spark job and Stage-0 runs smoothly however at Stage-1 I get an Exception:

org.hbase.async.CallQueueTooBigException: Call queue is full on /0.0.0.0:60020, too many items queued ?

However spark recovers the failed tasks and completes the Stage-1 and then Stage-2 completes flawlessly. Since Spark persists the previous results in memory, Stage-3 and Stage-4 is skipped and Stage-5 is started however Stage-5 gets the same CallQueueTooBigException exceptions, nevertheless spark recovers the problem again. 

My problem is this stage (Stage-5) takes too long to execute. Actually it took 14 hours at my last run and I killed the spark job. I think this is really odd for such a little input data(200 GB). Normally my cluster is so fast that I am able to load 3 TB of data into HBase(with bulkloading via mapreduce) in 1 hour. I tried to increase the number of workers

readGraph.compute(SparkGraphComputer).workers(1024).program(blvp).submit().get()

however this time the number of CallQueueTooBigException exceptions were so high that they did not let the spark job recover from the exceptions.

Is there any way that I can decrease the runtime of the job?


Below I am giving extra materials that hopefully may lead you to the source of the problem:

Here is how I start the gremlin shell

#!/bin/bash

export JAVA_HOME=/mnt/hdfs/jdk.1.8.0_74
export HADOOP_CONF_DIR= /etc/hadoop/conf.cloudera.yarn
export YARN_HOME=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-yarn
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark
export SPARK_CONF_DIR=$SPARK_HOME/conf


GREMLINHOME=/mnt/hdfs/janusgraph-0.1.1-hadoop2

export CLASSPATH=$YARN_HOME/*:$YARN_CONF_DIR:$SPARK_HOME/lib/*:$SPARK_CONF_DIR:$CLASSPATH

cd $GREMLINHOME
export GREMLIN_LOG_LEVEL=info
exec $GREMLINHOME/bin/gremlin.sh $*




and here is my conf/hadoop-graph/hadoop-call-script.properties file:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.GraphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat
gremlin.hadoop.inputLocation=/user/hive/warehouse/tablex/000000_0
gremlin.hadoop.scriptInputFormat.script=/user/me/janus/script-input-call.groovy
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true

spark.driver.maxResultSize=8192
spark.yarn.executor.memoryOverhead=5000
spark.executor.cores=1
spark.executor.instances=1024
spark.master=yarn-client
spark.executor.memory=20g
spark.driver.memory=20g
spark.serializer=org.apache.spark.serializer.JavaSerializer


conf/my-janusgraph-hbase.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
storage.hbase.region-count=1024
cluster.max-partitions=1024
cluster.partition=true

ids.block-size=10000
storage.buffer-size=10000
storage.transactions=false
ids.num-partitions=1024

storage.hbase.table=myjanus
storage.hostname=x.x.x.x
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.5



Thx in advance,
Ali


Re: Production users of JanusGraph

anurag <anurag...@...>
 

Hi Misha ,
Thanks a lot for your response and useful information much appreciated.
Thanks ,
Anurag

On Fri, May 26, 2017 at 2:28 PM, Misha Brukman <mbru...@...> wrote:
Hi Anurag,

I started a list of companies using JanusGraph in production; you can see the current list here: https://github.com/JanusGraph/janusgraph#users (and the logos at the bottom of http://janusgraph.org) and more additions are on the way.

They appear to be happy with JanusGraph, but I'll let them chime in if they want to provide any additional details.

BTW, if anyone else is a production user of JanusGraph, please get in touch with me and let's get you added as well!

Misha

On Wed, Apr 5, 2017 at 12:28 PM, anurag <anurag...@...> wrote:
All,
Many Thanks to the folks who were involved in setting up JanusGraph project . We are using Titan as GraphDB for a Beta feature , the reason our feature is in Beta is we were not sure where Titan is headed. Now that we have JanusGraph we would like to move to it, are they are any users of JanusGraph in production if so can you please share your experiences with it.
Many thanks.
Best,
Anurag

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



Re: Another perspective on JanusGraph embedded versus server mode

Ted Wilmes <twi...@...>
 

Hi Jamie,
Good question, and I dig the ASCII art. To answert your question, they will describe the same graph as if you were running the Janus instances in their own JVMs. I've used both approaches. The embedded approach was attractive initially because I could write Gremlin traversals without passing strings to the driver. Undoubtedly there would be some performance benefit because you're cutting a network hop out, but whether or not that would be appreciable would depend on the latency targets you're trying to hit. My guess is that for most folks, it won't make nearly as much of a difference as the latencies you're seeing between Janus and the Cassandra cluster. At this point, I prefer to deploy Janus like a standalone database for a few reasons. I'll list a few here. First, with the introduction of TinkerPop's remote graph & Gremlin Language Variants, you can still get that embedded feel with the driver [1]. Second, I like to be able to scale and tune the Janus DB components separately from the API. Finally, maybe less of an issue, but dependency conflicts between the API and an embedded Janus can be a pain, not insurmountable, but that goes away if throw a driver in between.

--Ted


On Friday, June 9, 2017 at 1:32:27 AM UTC-5, Jamie Lawson wrote:
We have a _domain specific_ REST API that is architecturally decoupled from JanusGraph. In other words, users of the REST API have no indication that their calls interact with JanusGraph, or even with a graph. These REST calls have a lot of interactions with the JanusGraph database which is currently embedded in the same JVM process. Here is a deployment view:


+-----------------------------------------+    +-----------------+
| JVM Process #1                          |    | JVM Process #2  |
|                                         |    |                 |
|  +-----------------+    +------------+  |    |  +-----------+  |
|  | Domain Specific |----| JanusGraph |--+----+--| Cassandra |  |
|  |    REST API     |    |  Embedded  |  |    |  |  Backend  |  |
|  +-----------------+    +------------+  |    |  +-----------+  |
|                                         |    |                 |
+-----------------------------------------+    +-----------------+


Now consider load balancing. The REST API is the only way we want to access the graph database. That's what keeps it "operationally consistent". If all updates are through the REST API, we will not get stuff in the database that doesn't make sense in the context of the domain. As we expand, is there a good reason to break out JVM Process #1 so that we have something that looks like this, with JanusGraph Server in a separate process:


+----------------------+    +-----------------+    +-----------------+
| JVM Process #1A      |    | JVM Process #1B |    | JVM Process #2  |
|                      |    |                 |    |                 |
|  +-----------------+ |    | +------------+  |    |  +-----------+  |
|  | Domain Specific |-+----+-| JanusGraph |--+----+--| Cassandra |  |
|  |    REST API     | |    | |   SERVER   |  |    |  |  Backend  |  |
|  +-----------------+ |    | +------------+  |    |  +-----------+  |
|                      |    |                 |    |                 |
+----------------------+    +-----------------+    +-----------------+

My expectation would be that connecting to JanusGraph through the embedded API would be much faster than connecting through a WebSocket API. Is that the case?

Now as we expand, is it reasonable to run our REST endpoint with an embedded JanusGraph in the same process and replicate that process with all of the embedded JanusGraphs talking to the same Cassandra backend, something like this:


+-----------------------------------------+
| JVM Process #1.1 on Node #1             |
|                                         |
|  +-----------------+    +------------+  |
|  | Domain Specific |----| JanusGraph |--+--------------+
|  | REST API endpt 1|    |  Embedded  |  |              |
|  +-----------------+    +------------+  |              |
|                                         |              |
+-----------------------------------------+              |
                                                         |
+-----------------------------------------+    +^^^^^^^^^|^^^^^^^+
| JVM Process #1.2 on Node #2             |    { Cluster Process }
|                                         |    {         |       }
|  +-----------------+    +------------+  |    {  +-----------+  }
|  | Domain Specific |----| JanusGraph |--+----+--| Cassandra |  }
|  | REST API endpt 2|    |  Embedded  |  |    {  |  Backend  |  }
|  +-----------------+    +------------+  |    {  +-----------+  }
|                                         |    {         |       }
+-----------------------------------------+    +^^^^^^^^^|^^^^^^^+
                                                         |
+-----------------------------------------+              |
| JVM Process #1.3 on Node #3             |              |
|                                         |              |
|  +-----------------+    +------------+  |              |
|  | Domain Specific |----| JanusGraph |--+--------------+
|  | REST API endpt 3|    |  Embedded  |  |
|  +-----------------+    +------------+  |
|                                         |
+-----------------------------------------+


The real question here is, if different embedded JanusGraphs have the same backend, do they describe the same graph (modulo eventual consistency)? I expect that they will have different stuff in cache, but will they describe the same graph?

And is there an expectation of a performance advantage if we break out the JanusGraph part and separate it from the REST API (running as JanusGraph Server), understanding that all interaction with the graph will be through the REST API, given that each REST call may make a number of sequential JanusGraph (Gremlin) calls?


Another perspective on JanusGraph embedded versus server mode

Jamie Lawson <jamier...@...>
 

We have a _domain specific_ REST API that is architecturally decoupled from JanusGraph. In other words, users of the REST API have no indication that their calls interact with JanusGraph, or even with a graph. These REST calls have a lot of interactions with the JanusGraph database which is currently embedded in the same JVM process. Here is a deployment view:


+-----------------------------------------+    +-----------------+
| JVM Process #1                          |    | JVM Process #2  |
|                                         |    |                 |
|  +-----------------+    +------------+  |    |  +-----------+  |
|  | Domain Specific |----| JanusGraph |--+----+--| Cassandra |  |
|  |    REST API     |    |  Embedded  |  |    |  |  Backend  |  |
|  +-----------------+    +------------+  |    |  +-----------+  |
|                                         |    |                 |
+-----------------------------------------+    +-----------------+


Now consider load balancing. The REST API is the only way we want to access the graph database. That's what keeps it "operationally consistent". If all updates are through the REST API, we will not get stuff in the database that doesn't make sense in the context of the domain. As we expand, is there a good reason to break out JVM Process #1 so that we have something that looks like this, with JanusGraph Server in a separate process:


+----------------------+    +-----------------+    +-----------------+
| JVM Process #1A      |    | JVM Process #1B |    | JVM Process #2  |
|                      |    |                 |    |                 |
|  +-----------------+ |    | +------------+  |    |  +-----------+  |
|  | Domain Specific |-+----+-| JanusGraph |--+----+--| Cassandra |  |
|  |    REST API     | |    | |   SERVER   |  |    |  |  Backend  |  |
|  +-----------------+ |    | +------------+  |    |  +-----------+  |
|                      |    |                 |    |                 |
+----------------------+    +-----------------+    +-----------------+

My expectation would be that connecting to JanusGraph through the embedded API would be much faster than connecting through a WebSocket API. Is that the case?

Now as we expand, is it reasonable to run our REST endpoint with an embedded JanusGraph in the same process and replicate that process with all of the embedded JanusGraphs talking to the same Cassandra backend, something like this:


+-----------------------------------------+
| JVM Process #1.1 on Node #1             |
|                                         |
|  +-----------------+    +------------+  |
|  | Domain Specific |----| JanusGraph |--+--------------+
|  | REST API endpt 1|    |  Embedded  |  |              |
|  +-----------------+    +------------+  |              |
|                                         |              |
+-----------------------------------------+              |
                                                         |
+-----------------------------------------+    +^^^^^^^^^|^^^^^^^+
| JVM Process #1.2 on Node #2             |    { Cluster Process }
|                                         |    {         |       }
|  +-----------------+    +------------+  |    {  +-----------+  }
|  | Domain Specific |----| JanusGraph |--+----+--| Cassandra |  }
|  | REST API endpt 2|    |  Embedded  |  |    {  |  Backend  |  }
|  +-----------------+    +------------+  |    {  +-----------+  }
|                                         |    {         |       }
+-----------------------------------------+    +^^^^^^^^^|^^^^^^^+
                                                         |
+-----------------------------------------+              |
| JVM Process #1.3 on Node #3             |              |
|                                         |              |
|  +-----------------+    +------------+  |              |
|  | Domain Specific |----| JanusGraph |--+--------------+
|  | REST API endpt 3|    |  Embedded  |  |
|  +-----------------+    +------------+  |
|                                         |
+-----------------------------------------+


The real question here is, if different embedded JanusGraphs have the same backend, do they describe the same graph (modulo eventual consistency)? I expect that they will have different stuff in cache, but will they describe the same graph?

And is there an expectation of a performance advantage if we break out the JanusGraph part and separate it from the REST API (running as JanusGraph Server), understanding that all interaction with the graph will be through the REST API, given that each REST call may make a number of sequential JanusGraph (Gremlin) calls?


Re: Index not being used with 'Between" clause

Jason Plurad <plu...@...>
 

Make sure you're using a mixed index for numeric range queries. Composite indexes are best for exact matching. The console session below shows the difference:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-lucene.properties')
==>standardjanusgraph[berkeleyje:/usr/lib/janusgraph-0.1.1-hadoop2/conf/../db/berkeley]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@1c8f6a90
gremlin
> lat = mgmt.makePropertyKey('lat').dataType(Integer.class).make()
==>lat
gremlin
> latidx = mgmt.buildIndex('latidx', Vertex.class).addKey(lat).buildCompositeIndex()
==>latidx
gremlin
> lon = mgmt.makePropertyKey('lon').dataType(Integer.class).make()
==>lon
gremlin
> lonidx = mgmt.buildIndex('lonidx', Vertex.class).addKey(lon).buildMixedIndex('search')
==>lonidx
gremlin
> mgmt.commit()
==>null
gremlin
> v = graph.addVertex('code', 'rdu', 'lat', 35, 'lon', -78)
==>v[4184]
gremlin
> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[berkeleyje:/usr/lib/janusgraph-0.1.1-hadoop2/conf/../db/berkeley], standard]
gremlin
> g.V().has('lat', 35)
==>v[4184]
gremlin
> g.V().has('lat', between(34, 36))
00:40:33 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(lat >= 34 AND lat < 36)]. For better performance, use indexes
==>v[4184]
gremlin
> g.V().has('lon', -78)
==>v[4184]
gremlin
> g.V().has('lon', between(-79, -77))
==>v[4184]

-- Jason


On Wednesday, June 7, 2017 at 12:01:31 PM UTC-4, Gene Fojtik wrote:
Hello,

Have an index in a property "latitude", when using with the between clause, the index is not being utilized

g.V().has("latitude", 33.333")  works well, however

g.V().has(“latitude”, between(33.889, 33.954)))  does not use the indexes..

Any assistance would be appreciated..

-g


Re: How to list properties from gremlin console

Jason Plurad <plu...@...>
 

Hi Gene,

Never tried to do it before seeing your post, but this seems workable:

gremlin> import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.*
==>groovy.grape.Grape, org.apache.commons.configuration.*, ...
gremlin
> graph = JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties')
==>standardjanusgraph[cassandrathrift:[127.0.0.1]]
gremlin
> graph.getConfiguration().getConfiguration().getSubset(STORAGE_NS)
==>lock.local-mediator-group=CassandraThriftStoreManagerjanusgraph
==>hostname=127.0.0.1
==>backend=cassandrathrift
gremlin
> graph.getConfiguration().getConfiguration().getSubset(INDEX_NS)
==>search.elasticsearch.client-only=true
==>search.backend=elasticsearch
==>search.hostname=127.0.0.1

-- Jason


On Tuesday, June 6, 2017 at 5:46:09 PM UTC-4, Gene Fojtik wrote:
Is there a way to list the current graph properties from the gremlin console?  Looking for the properties set in the *.properties file that have been sourced on start-up.

-gene


Re: Indexes stuck in INSTALLED status

Brandon Dean <engr...@...>
 

Thanks Rafael, I did see your post and I gave those settings a shot but that alone didn't seem to resolve the problem.  Following the steps in my last post seems to be a repeatable solution to getting indexes out of the INSTALLED status though.

I did some more testing on the disable process and it worked fine this time as long as I:
  1. performed a commit after issuing the DISABLE_INDEX
  2. insured that I had no open transactions
  3. waited patiently for the logs to indicate that the index had in fact been disabled before trying to perform any other operations on the index
At this point I feel like I know how to get things back on track, even though I still don't have a good understanding of how they got off track in the first place.  Thanks again for your response!


Re: [WARNING] 0.1.0 to 0.1.1 upgrade

Alexander Patrikalakis <amcpatr...@...>
 

It seems like 0.2.0 is on the horizon for June/early July. If the next TinkerPop release slips, we may consider a 0.1.2 release to fix the JG 0.1.0 compatibility issue and include the following adjustment the DynamoDB backend would like to have:
https://github.com/JanusGraph/janusgraph/commit/5d9ebb46c0e11c57f9336e9f38057e6e3b2e49a5

Alex


On Thursday, May 18, 2017 at 12:19:05 PM UTC-10, Ted Wilmes wrote:
Hello,
A step was missed during the release prep and the 0.1.1 release was not marked as compatible with JanusGraph 0.1.0. Consequently, if you point 0.1.1 at a previously loaded 0.1.0 backend (not Titan), JanusGraph will not start and you'll get: 

"StorageBackend version is incompatible with current JanusGraph version: storage [0.1.0] vs. runtime [0.1.1]"

The backend format is not really incompatible and an issue has been entered to address this [1].

The 0.1.1 release mainly fixes a serious Titan compatibility issue [2] and if you are upgrading from Titan, you will not be affected by this incompatibility error and can upgrade because Janus is correctly marked as compatible with Titan 1.0 and Titan 1.1-SNAPSHOT.

If you are running 0.1.0 right now from a brand new JanusGraph load (not an upgrade from Titan), hold off on upgrading until this issues is taken care of. If you planned on rebuilding your data from scratch when upgrading to 0.1.1, you can go ahead and upgrade as this only affects a previously loaded system.

Thanks,
Ted


Re: Indexes stuck in INSTALLED status

Rafael Fernandes <luizr...@...>
 

I've helped another user that asked me me in private exactly the same issue as yourself and my solution worked for him... you may wanna try it as well:

Let me know if it works so I can push a "workaround" section on documentation...
see below my email

=======
Reindexing has been tough for me too, so I had to write few wrapper classes and tons of debugging to make sure I didn't miss anything, but what I found out was that, some of my transactions were still opened, remember, titan/Janus open several threads and they might have opened transactions that you don't know about...
I'd suggest the following, if you can:
1) stop all your app servers
2) use gremlin console or write a simple java program
3) execute steps 29.1.5.1, http://docs.janusgraph.org/latest/index-admin.html
3.1) set this storage.parallel-backend-ops=false
3.2) set this ids.num-partitions=1
3.3) depending how large your graph is, this shouldn't take long...
4) once your indexes are done, bring up your servers.

That should do the trick, it did for me...
Enjoy,
Rafa


Index not being used with 'Between" clause

Gene Fojtik <genef...@...>
 

Hello,

Have an index in a property "latitude", when using with the between clause, the index is not being utilized

g.V().has("latitude", 33.333")  works well, however

g.V().has(“latitude”, between(33.889, 33.954)))  does not use the indexes..

Any assistance would be appreciated..

-g


Re: Indexes stuck in INSTALLED status

Brandon Dean <engr...@...>
 

After some further investigation, I realized I did apparently have one transaction that was holding things up.  Using graph.tx().rollback() had no effect so I used this method I found after some searching:

gremlin> size = graph.getOpenTransactions().size();

==>1

gremlin> for(i=0;i<size;i++) {graph.getOpenTransactions().getAt(0).rollback()}

==>null


That alone didn't make any difference but after doing that I ran the following commands and my index finally moved from INSTALLED to REGISTERED:

gremlin> mgmt = graph.openManagement()

gremlin> byName= mgmt.getGraphIndex('byNameComposite');

gremlin> propkey = mgmt.getPropertyKey('name');

gremlin> byName.getIndexStatus(propkey);

==>INSTALLED

gremlin> mgmt.updateIndex(byName, SchemaAction.REGISTER_INDEX).get()

gremlin> byName.getIndexStatus(propkey);

==>INSTALLED

gremlin> mgmt.commit()

gremlin> mgmt = graph.openManagement()

gremlin> byName.getIndexStatus(propkey);

==>INSTALLED

gremlin>  mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()

==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=REGISTERED, notConverged={}, converged={name=REGISTERED}, elapsed=PT0.008S]


It's not clear to me whether calling awaitGraphIndexStatus actually triggered anything or (if as I suspect) it is just monitoring the status and I needed to wait a little longer.

I also need to do some more testing with the disable process because none of this addresses the fact that when I tried to disable my index it got stuck in the INSTALLED state rather than moving to a DISABLED state as I expected.

On Tuesday, June 6, 2017 at 2:35:15 PM UTC-5, Brandon Dean wrote:
I've been struggling with my indexes getting stuck in the INSTALLED status with no apparent way to get them out of that status.  After much frustration, I started with a completely fresh install of JanusGraph 0.1.1, left the configuration as is, and then followed the steps from the documentation exactly to create a new composite index.  I was able to successfully create my index and it moved into the REGISTERED status and then the ENABLED status after issuing a REINDEX command.

1490727 [gremlin-server-session-1] INFO  org.janusgraph.graphdb.database.management.GraphIndexStatusWatchemposite do not currently have status REGISTERED: name=ENABLED

I then attempted to follow the documentation to delete this same index using the following steps:

gremlin> :remote connect tinkerpop.server /opt/vdp/janus/conf/remote.yaml session
gremlin> :remote console
gremlin> m = graph.openManagement()
gremlin> i = m.getGraphIndex('byNameComposite')
gremlin> m.updateIndex(i, SchemaAction.DISABLE_INDEX).get()
gremlin> m.commit()


Rather than moving to a DISABLED state though, my index is now back in the INSTALLED state and I am unable to move it to REGISTERED or DISABLED.  I've tried issuing additional commands to DISABLE or even REGISTER the index but they all time out and the status never changes from INSTALLED.

gremlin> m = graph.openManagement()
gremlin> i = m.getGraphIndex('byNameComposite')
gremlin> i.getIndexStatus(m.getPropertyKey('name'))
==>INSTALLED
gremlin> m.rollback()
gremlin> m.updateIndex(i, SchemaAction.REMOVE_INDEX).get()
Update action [REMOVE_INDEX] cannot be invoked for index with status [INSTALLED]

I'm currently the only user on this server.  I've performed rollbacks just to be certain and insured that there are no other open instances.  Anytime I issue an awaitGraphIndexStatus it times out and the status never changes.  What's the proper way to get this index to move out of the INSTALLED status?  Alternatively, what might be blocking that from occuring if it is supposed to happen automatically?

gremlin> graph.getOpenTransactions()
==>standardjanusgraphtx[0x18489ed6]
gremlin> graph.tx().rollback()
==>null
gremlin> graph.getOpenTransactions()
==>standardjanusgraphtx[0x18489ed6]
gremlin> graph.openManagement().getOpenInstances()
==>0a7f01141301102-dcwidphiat002-edc-nam-gm-com1(current)
gremlin> graph.openManagement().awaitGraphIndexStatus(graph, 'byNameComposite').call()
Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 30000 ms or evaluation wa request [graph.openManagement().awaitGraphIndexStatus(graph, 'byNameComposite').call()]: sleep interrupte

Any help for the new guy (me) would be very much appreciated!

On Tuesday, June 6, 2017 at 2:35:15 PM UTC-5, Brandon Dean wrote:
I've been struggling with my indexes getting stuck in the INSTALLED status with no apparent way to get them out of that status.  After much frustration, I started with a completely fresh install of JanusGraph 0.1.1, left the configuration as is, and then followed the steps from the documentation exactly to create a new composite index.  I was able to successfully create my index and it moved into the REGISTERED status and then the ENABLED status after issuing a REINDEX command.

1490727 [gremlin-server-session-1] INFO  org.janusgraph.graphdb.database.management.GraphIndexStatusWatchemposite do not currently have status REGISTERED: name=ENABLED

I then attempted to follow the documentation to delete this same index using the following steps:

gremlin> :remote connect tinkerpop.server /opt/vdp/janus/conf/remote.yaml session
gremlin> :remote console
gremlin> m = graph.openManagement()
gremlin> i = m.getGraphIndex('byNameComposite')
gremlin> m.updateIndex(i, SchemaAction.DISABLE_INDEX).get()
gremlin> m.commit()


Rather than moving to a DISABLED state though, my index is now back in the INSTALLED state and I am unable to move it to REGISTERED or DISABLED.  I've tried issuing additional commands to DISABLE or even REGISTER the index but they all time out and the status never changes from INSTALLED.

gremlin> m = graph.openManagement()
gremlin> i = m.getGraphIndex('byNameComposite')
gremlin> i.getIndexStatus(m.getPropertyKey('name'))
==>INSTALLED
gremlin> m.rollback()
gremlin> m.updateIndex(i, SchemaAction.REMOVE_INDEX).get()
Update action [REMOVE_INDEX] cannot be invoked for index with status [INSTALLED]

I'm currently the only user on this server.  I've performed rollbacks just to be certain and insured that there are no other open instances.  Anytime I issue an awaitGraphIndexStatus it times out and the status never changes.  What's the proper way to get this index to move out of the INSTALLED status?  Alternatively, what might be blocking that from occuring if it is supposed to happen automatically?

gremlin> graph.getOpenTransactions()
==>standardjanusgraphtx[0x18489ed6]
gremlin> graph.tx().rollback()
==>null
gremlin> graph.getOpenTransactions()
==>standardjanusgraphtx[0x18489ed6]
gremlin> graph.openManagement().getOpenInstances()
==>0a7f01141301102-dcwidphiat002-edc-nam-gm-com1(current)
gremlin> graph.openManagement().awaitGraphIndexStatus(graph, 'byNameComposite').call()
Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 30000 ms or evaluation wa request [graph.openManagement().awaitGraphIndexStatus(graph, 'byNameComposite').call()]: sleep interrupte

Any help for the new guy (me) would be very much appreciated!


Re: Geo Data

JZ <zamb...@...>
 

Thanks for feedback, very helpful.


On Wednesday, June 7, 2017 at 9:33:07 AM UTC-4, JZ wrote:

(1)   Was looking at the GEO facilities, does someone know what algorithm or approach is used by the implementation for GEO location calculations?


Re: Geo Data

Robert Dale <rob...@...>
 

If it's an index query, it should use the index backend. If it's mid-traversal, it will likely use a geo predicate which uses spatial4j [1].  ES, solr use lucene. Lucene also uses spatial4j.  So I think in the end everything uses spatial4j.  And they all also use JTS [2] to some degree.

The Java Topology Suite (JTS) is currently required to use line, polygon, multi-point, multi-line and multi-polygon geometries. JTS is not included in JanusGraph distributions by default due to its LGPL license. Users must download the JTS JAR file separately and include it in the classpath when full geometry support is required. [3]

c


Robert Dale

On Wed, Jun 7, 2017 at 1:44 AM, JZ <zamb...@...> wrote:

(1)   Was looking at the GEO facilities, does someone know what algorithm or approach is used by the implementation for GEO location calculations?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Who is using JanusGraph in production?

Liu-Cheng Xu <xuliuc...@...>
 

Awesome! Could you also share the links here when ready?

Tunay Gür <tuna...@...>于2017年6月7日周三 上午6:47写道:

We are (uber) using JG in production at the moment. We're recently started to contribute some of our code back and it's in plan to publish our learnings, benchmarks etc as series of blog posts. 


On Monday, June 5, 2017 at 11:53:37 AM UTC-7, Misha Brukman wrote:
Great point! I welcome case studies and in-depth descriptions, but those take a significant effort to write as well as get approved by appropriate PR/Legal/etc. departments, so while I am always advocating for these, it's not always going to be possible.

In the meantime, here's one from CELUM (will be added shortly to the website): https://www.celum.com/en/graph-driven-and-reactive-architecture and I hope we'll be adding more of these in the future.

On Mon, Jun 5, 2017 at 2:46 PM, Michael Markieta <ma...@...> wrote:
It would be great if the current user list also said a little bit about how they use it. It's impossible to tell how FiNQ and Seeq are using JanusGraph. We would benefit from having case study like material for others to look at (in due time).

On Saturday, 27 May 2017 08:13:17 UTC-4, Jimmy wrote:
Great! Thank you for your work!

Misha Brukman <mb...@...>于2017年5月27日周六 上午5:29写道:
Hi Jimmy,

I started building a list of companies using JanusGraph in production; you can see the current list here: https://github.com/JanusGraph/janusgraph#users (and the logos at the bottom of http://janusgraph.org) and more additions are on the way.

They appear to be happy with JanusGraph, but I'll let them chime in if they want to provide any additional details.

BTW, if anyone else is a production user of JanusGraph, please get in touch with me and let's get you added on the list as well!

Misha
On Fri, Apr 7, 2017 at 4:03 AM, Jimmy <xul...@...> wrote:
Lovely and promising project! I want to know if anyone is using JanusGraph in production at present?Thanks!

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
For more options, visit https://groups.google.com/d/optout.
--
Liu-Cheng Xu

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
--
Liu-Cheng Xu


Re: Character case behaviour different with or without indices

ni...@...
 

Ok, thanks. That is good to know.


Re: Character case behaviour different with or without indices

tpr...@...
 

If you want to use both, you have to use TEXTSTRING mapping.


Re: Character case behaviour different with or without indices

ni...@...
 

Great.

I was using STRING and TEXT indices and janus was only finding the TEXT one. I stopped using the TEXT index and it started to behave as expected. 
I had assumed that janus would use the right index based on the query (textRegex vs textContainsRegex).

Thanks for your help.


Geo Data

JZ <zamb...@...>
 

(1)   Was looking at the GEO facilities, does someone know what algorithm or approach is used by the implementation for GEO location calculations?


Re: Character case behaviour different with or without indices

tpr...@...
 

Are you using DEFAULT mapping or no Mapping ?
If so, DEFAULT mapping or no mapping use TEXT mapping which are bind to Text ES datatype (https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) which are lowercased by default.

So, if you use STRING mapping which is bind to Keyword ES datatype which are untouched by default, it should work.


Le mardi 6 juin 2017 23:24:46 UTC+2, Nigel Brown a écrit :
Elasticsearch 5.1.1

6361 - 6380 of 6663