Date   

Re: Indexes on edge properties

Jason Plurad <plu...@...>
 

Did you create the index before inserting data? Did you make sure to commit the management transaction, i.e. mgmt.commit()?

In this example below, you can see a name property getting created and indexed for edges.

gremlin> graph = JanusGraphFactory.open('inmemory')
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@222afc67
gremlin
> name = mgmt.makePropertyKey("name").dataType(String.class).make()
==>name
gremlin
> mgmt.buildIndex("byName", Edge.class).addKey(name).buildCompositeIndex()
==>byName
gremlin
> mgmt.commit()
==>null
gremlin
> v0 = graph.addVertex('title', 'dr')
==>v[4288]
gremlin
> v1 = graph.addVertex('title', 'mrs')
==>v[4120]
gremlin
> v0.addEdge('to', v1, 'name', 'foo')
==>e[17c-3b4-36d-36g][4288-to->4120]
gremlin
> graph.tx().commit()
==>null
gremlin
> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin
> g.E().has('name', 'foo')
==>e[17c-3b4-36d-36g][4288-to->4120]
gremlin
> g.E().has('title', 'foo')
10:03:41 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(title = foo)]. For better performance, use indexes

For the "iterating over all vertices", the reason is because the message is hardcoded to say "vertices". Perhaps "elements" might be better.

-- Jason

On Tuesday, July 4, 2017 at 9:47:29 AM UTC-4, Thijs Broersen wrote:
I have set an index on some edge property but when I execute 
g.E().has("name", "somename"))
it does not use the index and gives me: "Query requires iterating over all vertices..." I created the index using: 
name = mgmt.makePropertyKey("name").dataType(String.class).make()
mgmt
.buildIndex("byName", Edge.class).addKey(name).buildCompositeIndex()
any suggestions? Why does it say "iterating over all vertices" instead of "iterating over all edges"


Indexes on edge properties

Thijs Broersen <mht.b...@...>
 

I have set an index on some edge property but when I execute 
g.E().has("name", "somename"))
it does not use the index and gives me: "Query requires iterating over all vertices..." I created the index using: 
name = mgmt.makePropertyKey("name").dataType(String.class).make()
mgmt
.buildIndex("byName", Edge.class).addKey(name).buildCompositeIndex()
any suggestions? Why does it say "iterating over all vertices" instead of "iterating over all edges"


Handling backend connection issues on Gremlin-Server start up

Carlos <512.qua...@...>
 

So I'm using Cassandra as my backend and I've noticed that if I've accidentally started my services out of order, Gremlin-Server will still successfully start up, but complain about not being able to connect to Cassandra. Gremlin-Server will still listen for connections and proceed to accept them, however any requests that are traversals will only end up causing an error to be returned. 

a) Is this normal behavior to have Gremlin-Server continue even though the selected backend cannot be contacted on start up? 
b) If it's normal behavior, is there a way for me to send a command to the Gremlin-Server so it will attempt a reconnection when I know Cassandra is up? 


Use of mgmt.setTTL to expire edges

ke...@...
 

Hi,

I am experimenting with using the mgmt.setTTL option to automatically expire edges after 7 days - which is working well and generating much less overhead then trying to execute a task that finds and drops the edges.  However this option does not drop the entries from external indexes in Elasticsearch. I imaging that this is because the setTTL is being passed directly to the storage backend to handle (Cassandra in my case) and so only happens there.

Is this working as expected? Are there plans to sync the deletions in Elasticsearch? Or should I plan to manually run a purge from Elasticsearch in line with the 7 day expiry in Cassandra?

Thanks for any of your views!

Kevin


Re: keyspaces in JanusGraph

Jean-Baptiste Musso <jbm...@...>
 

Hello,

Side note to your question - I suggest that you use https://www.npmjs.com/package/gremlin instead - the former is deprecated (unsure if that shows up in the log when installing).

Jean-Baptiste

On Wed, 28 Jun 2017 at 17:52, Peter Musial <pmmu...@...> wrote:
Hi All,

Cassandra allows definition of multiple keyspaces.  I am using nodejs w/GremlinClient module (npm install gremlin-client) to handle query execution.  Although I know how to set up a keyspace, it is not clear to me how to initialize gremlin client to use a specific keyspace.  Is there a programatic way of selecting a namespace, or is it only possible when loading gremlin-server with specific configuration file.


Thanks,

Peter

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
--
Jean-Baptiste


Re: keyspaces in JanusGraph

Ted Wilmes <twi...@...>
 

Hi Peter,
You can setup Janus to use a keyspace of your choose.  See the config section of the docs for some relevant properties to set: http://docs.janusgraph.org/latest/config-ref.html#_storage_cassandra.

--Ted


On Wednesday, June 28, 2017 at 10:52:04 AM UTC-5, Peter wrote:
Hi All,

Cassandra allows definition of multiple keyspaces.  I am using nodejs w/GremlinClient module (npm install gremlin-client) to handle query execution.  Although I know how to set up a keyspace, it is not clear to me how to initialize gremlin client to use a specific keyspace.  Is there a programatic way of selecting a namespace, or is it only possible when loading gremlin-server with specific configuration file.


Thanks,

Peter


Re: Support for ES 5 and Cassandra datastax driver

Ted Wilmes <twi...@...>
 

Hi Mountu,
Elasticsearch 5.x and a new CQL storage adapter will be in the next release, 0.2.0. 

Thanks,
Ted


On Sunday, July 2, 2017 at 6:44:42 AM UTC-5, Mountu Jinwala wrote:
Hi,

Does Janus graph have support Elasticsearch 5.4 and cassandra using Datastax  CQL driver instead of astyanax client ?

thanks
Mountu


Re: java.io.EOFException in kryo+blvp error in bulk loading

marc.d...@...
 

Hi Eliz and Meng,

Did the seqence of gremlin commands work for the tinkerpop-modern.kryo and grateful-dead.kryo example files?

How did you create the test.kryo file?

Marc

Op woensdag 28 juni 2017 15:02:27 UTC+2 schreef Elizabeth:

Hi all,  

I was using the Kryo format and BulkLoaderVertexProgram to load large files into Janusgraph, and encountered an error:

gremlin> hdfs.copyFromLocal('data/test.kryo','data/test.kryo')
==>null
gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin>
gremlin> blvp = BulkLoaderVertexProgram.build().writeGraph('conf/janusgraph-hbase-es.properties').create(graph)
==>BulkLoaderVertexProgram[bulkLoader=IncrementalBulkLoader, vertexIdProperty=bulkLoader.vertex.id, userSuppliedIds=false, keepOriginalIds=true, batchSize=0]
gremlin>
gremlin> result = graph.compute(SparkGraphComputer).program(blvp).submit().get()
20:21:32 ERROR org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 0.0 (TID 0)
java.io.EOFException
    at java.io.DataInputStream.readByte(DataInputStream.java:267)
    at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.seekToHeader(GryoRecordReader.java:93)
    at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.initialize(GryoRecordReader.java:85)
    at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat.createRecordReader(GryoInputFormat.java:38)

has anyone ever had this error before, please help me with this last step!
Any idea is appreciated!

Best,
Meng


Support for ES 5 and Cassandra datastax driver

Mountu Jinwala <maji...@...>
 

Hi,

Does Janus graph have support Elasticsearch 5.4 and cassandra using Datastax  CQL driver instead of astyanax client ?

thanks
Mountu


Re: Who is using JanusGraph in production?

Kelvin Lawrence <kelvin....@...>
 

Hi Jimmy, as you would expect, here at IBM we have a lot of projects underway that will use Janus Graph.

I try not to do product ads on open source lists but for sure we are adopters in fact I have the Gremlin console up in front of me connected to a Janus graph as I type this :-)

Cheers,
Kelvin


On Friday, April 7, 2017 at 8:15:31 AM UTC-5, Jimmy wrote:
Lovely and promising project! I want to know if anyone is using JanusGraph in production at present?Thanks!


keyspaces in JanusGraph

Peter Musial <pmmu...@...>
 

Hi All,

Cassandra allows definition of multiple keyspaces.  I am using nodejs w/GremlinClient module (npm install gremlin-client) to handle query execution.  Although I know how to set up a keyspace, it is not clear to me how to initialize gremlin client to use a specific keyspace.  Is there a programatic way of selecting a namespace, or is it only possible when loading gremlin-server with specific configuration file.


Thanks,

Peter


java.io.EOFException in kryo+blvp error in bulk loading

Elizabeth <hlf...@...>
 

Hi all,  

I was using the Kryo format and BulkLoaderVertexProgram to load large files into Janusgraph, and encountered an error:

gremlin> hdfs.copyFromLocal('data/test.kryo','data/test.kryo')
==>null
gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin>
gremlin> blvp = BulkLoaderVertexProgram.build().writeGraph('conf/janusgraph-hbase-es.properties').create(graph)
==>BulkLoaderVertexProgram[bulkLoader=IncrementalBulkLoader, vertexIdProperty=bulkLoader.vertex.id, userSuppliedIds=false, keepOriginalIds=true, batchSize=0]
gremlin>
gremlin> result = graph.compute(SparkGraphComputer).program(blvp).submit().get()
20:21:32 ERROR org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 0.0 (TID 0)
java.io.EOFException
    at java.io.DataInputStream.readByte(DataInputStream.java:267)
    at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.seekToHeader(GryoRecordReader.java:93)
    at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.initialize(GryoRecordReader.java:85)
    at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat.createRecordReader(GryoInputFormat.java:38)

has anyone ever had this error before, please help me with this last step!
Any idea is appreciated!

Best,
Meng


Loading 10k nodes on Janusgraph/BerkeleyDB

Damien Seguy <damie...@...>
 

Hi,

I'm running Janusgraph 0.1.1, on OSX. Berkeley db is the backend. I used xms256m and xmx5g

I'm trying to load a graphson into Janus. There are various graphson of various sizes.

When the graphson is below 10k nodes, it usually goes well. It is much faster with 200 tokens than with 9000 (sounds normal).

When I reach 10k tokens, something gets wrong and berkeley db emits a lot of errors: 

176587 [pool-6-thread-1] WARN  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Could not read messages for timestamp [2017-06-27T16:28:42.502Z] (this read will be retried)

org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception


Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 7.3.7) db/berkeley java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.




The load script is simple : 


graph.io(IoCore.graphson()).readGraph("/tmp/file.graphson");



There are no index (yet). 


Sometimes, I managed to query the graph with another connexion (the loading never ends), and g.V().count() tells 10000. 

This looks like a transaction/batch size, but I don't know where to go with that information.


I'm sure there is something huge that I'm missing. Any pointer would be helpful.


Damien Seguy 



Re: MapReduceIndexManagement reindex not completing successfully

Nigel Brown <nigel...@...>
 

I should add that I get the same results from the simple example from the docs with a couple of nodes only
// Open a graph
graph = JanusGraphFactory.open("target.properties")
g = graph.traversal()

// Define a property
mgmt = graph.openManagement()
desc = mgmt.makePropertyKey("desc").dataType(String.class).make()
mgmt.commit()

// Insert some data
graph.addVertex("desc", "foo bar")
graph.addVertex("desc", "foo baz")
graph.tx().commit()

// Run a query -- note the planner warning recommending the use of an index
g.V().has("desc", textContains("baz"))

// Create an index
mgmt = graph.openManagement()

desc = mgmt.getPropertyKey("desc")
mixedIndex = mgmt.buildIndex("mixedExample", Vertex.class).addKey(desc).buildMixedIndex("search")
mgmt.commit()

// Rollback or commit transactions on the graph which predate the index definition
graph.tx().rollback()

// Block until the SchemaStatus transitions from INSTALLED to REGISTERED
report = mgmt.awaitGraphIndexStatus(graph, "mixedExample").call()

// Run a JanusGraph-Hadoop job to reindex
mgmt = graph.openManagement()
mr = new MapReduceIndexManagement(graph)
mr.updateIndex(mgmt.getGraphIndex("mixedExample"), SchemaAction.REINDEX).get()

// Enable the index
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("mixedExample"), SchemaAction.ENABLE_INDEX).get()
mgmt.commit()

// Block until the SchemaStatus is ENABLED
mgmt = graph.openManagement()
report = mgmt.awaitGraphIndexStatus(graph, "mixedExample").status(SchemaStatus.ENABLED).call()
mgmt.rollback()

// Run a query -- JanusGraph will use the new index, no planner warning
g.V().has("desc", textContains("baz"))

// Concerned that JanusGraph could have read cache in that last query, instead of relying on the index?
// Start a new instance to rule out cache hits.  Now we're definitely using the index.
graph.close()
graph = JanusGraphFactory.open("target.properties")
g.V().has("desc", textContains("baz"))

 


MapReduceIndexManagement reindex not completing successfully

nigel...@...
 

I am using a snapshot build, janusgraph-0.2.0-SNAPSHOT-hadoop2, and I am trying to reindex a mixed index using a map reduce job.

graph = JanusGraphFactory.open('target.properties')
mgmt = graph.openManagement()
mr = new MapReduceIndexManagement(graph)
mr.updateIndex(mgmt.getGraphIndex("mixedV"), SchemaAction.REINDEX).get()

This starts up, and tries to do some stuff (I have tried stepping through the code). There is a warning at start
15:07:16 WARN  org.apache.hadoop.mapreduce.JobResourceUploader  - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

I don't think that is relevant.

After a short time I get multiple warnings like this


15:08:09 WARN  org.apache.thrift.transport.TIOStreamTransport  - Error closing output stream.

java.net.SocketException: Socket closed

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at java.io.FilterOutputStream.close(FilterOutputStream.java:158)

at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)

at org.apache.thrift.transport.TSocket.close(TSocket.java:194)

at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.closeClient(ThriftSyncConnectionFactoryImpl.java:272)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.access$800(ThriftSyncConnectionFactoryImpl.java:92)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection$2.run(ThriftSyncConnectionFactoryImpl.java:254)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Eventually the job fails

15:08:37 WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local311804379_0001

java.lang.Exception: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

Caused by: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:159)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog.readSetting(KCVSLog.java:818)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog.<init>(KCVSLog.java:270)

at org.janusgraph.diskstorage.log.kcvs.KCVSLogManager.openLog(KCVSLogManager.java:225)

at org.janusgraph.diskstorage.Backend.initialize(Backend.java:275)

at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1841)

at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)

at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:107)

at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:97)

at org.janusgraph.hadoop.scan.HadoopVertexScanMapper.setup(HadoopVertexScanMapper.java:37)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT4S

at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)

... 19 more

Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:128)

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:92)

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:81)

at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)

at org.janusgraph.diskstorage.keycolumnvalue.KCVSUtil.get(KCVSUtil.java:52)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog$3.call(KCVSLog.java:821)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog$3.call(KCVSLog.java:818)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:148)

at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:162)

at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)

... 20 more

Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=0(0), attempts=1]org.apache.thrift.transport.TTransportException: java.net.SocketException: Bad file descriptor

at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:139)

at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)

at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)

at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538)

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:126)

... 29 more

Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Bad file descriptor

at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)

at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158)

at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)

at org.apache.cassandra.thrift.Cassandra$Client.send_set_keyspace(Cassandra.java:602)

at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:594)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:127)

... 33 more

Caused by: java.net.SocketException: Bad file descriptor

at java.net.SocketOutputStream.socketWrite0(Native Method)

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)

... 38 more

15:08:37 WARN  org.apache.thrift.transport.TIOStreamTransport  - Error closing output stream.

java.net.SocketException: Socket closed

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at java.io.FilterOutputStream.close(FilterOutputStream.java:158)

at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)

at org.apache.thrift.transport.TSocket.close(TSocket.java:194)

at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.closeClient(ThriftSyncConnectionFactoryImpl.java:272)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.access$800(ThriftSyncConnectionFactoryImpl.java:92)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection$2.run(ThriftSyncConnectionFactoryImpl.java:254)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

java.io.IOException: MapReduce JobID job_local311804379_0001 terminated abnormally: state=FAILED, failureinfo=NA

Type ':help' or ':h' for help.

Display stack trace? [yN]y

java.util.concurrent.ExecutionException: java.io.IOException: MapReduce JobID job_local311804379_0001 terminated abnormally: state=FAILED, failureinfo=NA

at org.janusgraph.hadoop.MapReduceIndexManagement$FailedJobFuture.get(MapReduceIndexManagement.java:297)

at org.janusgraph.hadoop.MapReduceIndexManagement$FailedJobFuture.get(MapReduceIndexManagement.java:267)

at java_util_concurrent_Future$get.call(Unknown Source)

at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)

at groovysh_evaluate.run(groovysh_evaluate:3)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:70)

at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:191)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)

at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)

at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)

at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:169)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:478)

Caused by: java.io.IOException: MapReduce JobID job_local311804379_0001 terminated abnormally: state=FAILED, failureinfo=NA

at org.janusgraph.hadoop.scan.HadoopScanRunner.runJob(HadoopScanRunner.java:147)

at org.janusgraph.hadoop.MapReduceIndexManagement.updateIndex(MapReduceIndexManagement.java:186)

at org.janusgraph.hadoop.MapReduceIndexManagement$updateIndex.call(Unknown Source)

at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)

... 42 more


Other map-reduce jobs seem to run (e.g. page rank and some of the other demos).

I can reindex with the Management API. I am assuming our graphs will get too big for that. This is a small graph with a few thousand nodes. Cassandra running locally on one machine.

Any comments or hints on getting this running would be most welcome.





Re: professional support for JanusGraph

Peter Musial <pmmu...@...>
 

Good point.  Thank you.


On Monday, June 26, 2017 at 2:28:04 PM UTC-4, Kelvin Lawrence wrote:
Hi Peter, I try not to do product ads on open source mailing lists so I'll just mention, in case others find this thread,  that there are definitely going to be announcements in this area from at least one company that I am familiar with.

Cheers
Kelvin

On Friday, June 23, 2017 at 10:30:12 AM UTC-5, Peter Musial wrote:
Hi All,

Are there companies that provide professional support for production deployments?

Regards,

Peter


Re: professional support for JanusGraph

Kelvin Lawrence <kelvin....@...>
 

Hi Peter, I try not to do product ads on open source mailing lists so I'll just mention, in case others find this thread,  that there are definitely going to be announcements in this area from at least one company that I am familiar with.

Cheers
Kelvin


On Friday, June 23, 2017 at 10:30:12 AM UTC-5, Peter Musial wrote:
Hi All,

Are there companies that provide professional support for production deployments?

Regards,

Peter


Re: bulk loading error

HadoopMarc <m.c.d...@...>
 

And this was the answer that Eliz referred to above:

Hi Eliz,

Good to hear that you make progress. I do not see this post on the gremlin users list. Would you be so kind as to post it there? I'll then add the answers below.

As to your questions:

  • id block reservation during bulkload is described in section 20.1.2 of:
    http://docs.janusgraph.org/latest/bulk-loading.html

  • Fighting GC/OOM: give gremlin console's JVM more memory in its startup script (java -Xmx command line option). Another possibility is to limit the transactions to say 100.000 vertices, so commit more often.

  • OLAP: the exception does not seem familiar to me. Maybe the JG code example refers to an older TP version.
    Therefore, it could help if you compare with the blvp example in TP (TP runs all code examples during ref doc generation!):
    http://tinkerpop.apache.org/docs/3.2.3/reference/#sparkgraphcomputer
    As far as I know the dependencies in the JG distribution are complete and do not need a TP install.
HTH,     Marc



Op maandag 26 juni 2017 15:30:03 UTC+2 schreef Ted Wilmes:

Hi Eliz,
For your first code snippet, you'll need to add in a periodic commit every X number of vertices instead of after you've loaded the whole file. That X will vary depending on your hardware, etc. but you can experiment and find what gives you the best performance. I'd suggest starting at 100 and going from there. Once you get that working, you could try loading data in parallel by spinning up multiple threads that are addV'ing and periodically committing.

For the second approach, using the TinkerPop BulkLoaderVertexProgram, you do not need to download TP separately. I think from looking at your stacktrace, you may just be missing a bit when you constructed the vertex program. Did you call create at the end of its construction like in this little snippet?

blvp = BulkLoaderVertexProgram.build().
                    bulkLoader(OneTimeBulkLoader).
                    writeGraph(writeGraphConf).create(modern)

Create takes the input graph that you're reading from as an argument.

--Ted

On Sunday, June 25, 2017 at 8:48:57 PM UTC-5, Elizabeth wrote:
Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz


Re: bulk loading error

Ted Wilmes <twi...@...>
 

Hi Eliz,
For your first code snippet, you'll need to add in a periodic commit every X number of vertices instead of after you've loaded the whole file. That X will vary depending on your hardware, etc. but you can experiment and find what gives you the best performance. I'd suggest starting at 100 and going from there. Once you get that working, you could try loading data in parallel by spinning up multiple threads that are addV'ing and periodically committing.

For the second approach, using the TinkerPop BulkLoaderVertexProgram, you do not need to download TP separately. I think from looking at your stacktrace, you may just be missing a bit when you constructed the vertex program. Did you call create at the end of its construction like in this little snippet?

blvp = BulkLoaderVertexProgram.build().
                    bulkLoader(OneTimeBulkLoader).
                    writeGraph(writeGraphConf).create(modern)

Create takes the input graph that you're reading from as an argument.

--Ted

On Sunday, June 25, 2017 at 8:48:57 PM UTC-5, Elizabeth wrote:
Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz


bulk loading error

Elizabeth <hlf...@...>
 

Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz

6301 - 6320 of 6678