Date   

Loading 10k nodes on Janusgraph/BerkeleyDB

Damien Seguy <damie...@...>
 

Hi,

I'm running Janusgraph 0.1.1, on OSX. Berkeley db is the backend. I used xms256m and xmx5g

I'm trying to load a graphson into Janus. There are various graphson of various sizes.

When the graphson is below 10k nodes, it usually goes well. It is much faster with 200 tokens than with 9000 (sounds normal).

When I reach 10k tokens, something gets wrong and berkeley db emits a lot of errors: 

176587 [pool-6-thread-1] WARN  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Could not read messages for timestamp [2017-06-27T16:28:42.502Z] (this read will be retried)

org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception


Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 7.3.7) db/berkeley java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.




The load script is simple : 


graph.io(IoCore.graphson()).readGraph("/tmp/file.graphson");



There are no index (yet). 


Sometimes, I managed to query the graph with another connexion (the loading never ends), and g.V().count() tells 10000. 

This looks like a transaction/batch size, but I don't know where to go with that information.


I'm sure there is something huge that I'm missing. Any pointer would be helpful.


Damien Seguy 



Re: MapReduceIndexManagement reindex not completing successfully

Nigel Brown <nigel...@...>
 

I should add that I get the same results from the simple example from the docs with a couple of nodes only
// Open a graph
graph = JanusGraphFactory.open("target.properties")
g = graph.traversal()

// Define a property
mgmt = graph.openManagement()
desc = mgmt.makePropertyKey("desc").dataType(String.class).make()
mgmt.commit()

// Insert some data
graph.addVertex("desc", "foo bar")
graph.addVertex("desc", "foo baz")
graph.tx().commit()

// Run a query -- note the planner warning recommending the use of an index
g.V().has("desc", textContains("baz"))

// Create an index
mgmt = graph.openManagement()

desc = mgmt.getPropertyKey("desc")
mixedIndex = mgmt.buildIndex("mixedExample", Vertex.class).addKey(desc).buildMixedIndex("search")
mgmt.commit()

// Rollback or commit transactions on the graph which predate the index definition
graph.tx().rollback()

// Block until the SchemaStatus transitions from INSTALLED to REGISTERED
report = mgmt.awaitGraphIndexStatus(graph, "mixedExample").call()

// Run a JanusGraph-Hadoop job to reindex
mgmt = graph.openManagement()
mr = new MapReduceIndexManagement(graph)
mr.updateIndex(mgmt.getGraphIndex("mixedExample"), SchemaAction.REINDEX).get()

// Enable the index
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("mixedExample"), SchemaAction.ENABLE_INDEX).get()
mgmt.commit()

// Block until the SchemaStatus is ENABLED
mgmt = graph.openManagement()
report = mgmt.awaitGraphIndexStatus(graph, "mixedExample").status(SchemaStatus.ENABLED).call()
mgmt.rollback()

// Run a query -- JanusGraph will use the new index, no planner warning
g.V().has("desc", textContains("baz"))

// Concerned that JanusGraph could have read cache in that last query, instead of relying on the index?
// Start a new instance to rule out cache hits.  Now we're definitely using the index.
graph.close()
graph = JanusGraphFactory.open("target.properties")
g.V().has("desc", textContains("baz"))

 


MapReduceIndexManagement reindex not completing successfully

nigel...@...
 

I am using a snapshot build, janusgraph-0.2.0-SNAPSHOT-hadoop2, and I am trying to reindex a mixed index using a map reduce job.

graph = JanusGraphFactory.open('target.properties')
mgmt = graph.openManagement()
mr = new MapReduceIndexManagement(graph)
mr.updateIndex(mgmt.getGraphIndex("mixedV"), SchemaAction.REINDEX).get()

This starts up, and tries to do some stuff (I have tried stepping through the code). There is a warning at start
15:07:16 WARN  org.apache.hadoop.mapreduce.JobResourceUploader  - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

I don't think that is relevant.

After a short time I get multiple warnings like this


15:08:09 WARN  org.apache.thrift.transport.TIOStreamTransport  - Error closing output stream.

java.net.SocketException: Socket closed

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at java.io.FilterOutputStream.close(FilterOutputStream.java:158)

at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)

at org.apache.thrift.transport.TSocket.close(TSocket.java:194)

at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.closeClient(ThriftSyncConnectionFactoryImpl.java:272)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.access$800(ThriftSyncConnectionFactoryImpl.java:92)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection$2.run(ThriftSyncConnectionFactoryImpl.java:254)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Eventually the job fails

15:08:37 WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local311804379_0001

java.lang.Exception: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

Caused by: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:159)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog.readSetting(KCVSLog.java:818)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog.<init>(KCVSLog.java:270)

at org.janusgraph.diskstorage.log.kcvs.KCVSLogManager.openLog(KCVSLogManager.java:225)

at org.janusgraph.diskstorage.Backend.initialize(Backend.java:275)

at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1841)

at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)

at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:107)

at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:97)

at org.janusgraph.hadoop.scan.HadoopVertexScanMapper.setup(HadoopVertexScanMapper.java:37)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT4S

at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)

... 19 more

Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:128)

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:92)

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:81)

at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)

at org.janusgraph.diskstorage.keycolumnvalue.KCVSUtil.get(KCVSUtil.java:52)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog$3.call(KCVSLog.java:821)

at org.janusgraph.diskstorage.log.kcvs.KCVSLog$3.call(KCVSLog.java:818)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:148)

at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:162)

at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)

... 20 more

Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=0(0), attempts=1]org.apache.thrift.transport.TTransportException: java.net.SocketException: Bad file descriptor

at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:139)

at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)

at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)

at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538)

at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:126)

... 29 more

Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Bad file descriptor

at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)

at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158)

at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)

at org.apache.cassandra.thrift.Cassandra$Client.send_set_keyspace(Cassandra.java:602)

at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:594)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:127)

... 33 more

Caused by: java.net.SocketException: Bad file descriptor

at java.net.SocketOutputStream.socketWrite0(Native Method)

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)

... 38 more

15:08:37 WARN  org.apache.thrift.transport.TIOStreamTransport  - Error closing output stream.

java.net.SocketException: Socket closed

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)

at java.net.SocketOutputStream.write(SocketOutputStream.java:153)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

at java.io.FilterOutputStream.close(FilterOutputStream.java:158)

at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)

at org.apache.thrift.transport.TSocket.close(TSocket.java:194)

at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.closeClient(ThriftSyncConnectionFactoryImpl.java:272)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.access$800(ThriftSyncConnectionFactoryImpl.java:92)

at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection$2.run(ThriftSyncConnectionFactoryImpl.java:254)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

java.io.IOException: MapReduce JobID job_local311804379_0001 terminated abnormally: state=FAILED, failureinfo=NA

Type ':help' or ':h' for help.

Display stack trace? [yN]y

java.util.concurrent.ExecutionException: java.io.IOException: MapReduce JobID job_local311804379_0001 terminated abnormally: state=FAILED, failureinfo=NA

at org.janusgraph.hadoop.MapReduceIndexManagement$FailedJobFuture.get(MapReduceIndexManagement.java:297)

at org.janusgraph.hadoop.MapReduceIndexManagement$FailedJobFuture.get(MapReduceIndexManagement.java:267)

at java_util_concurrent_Future$get.call(Unknown Source)

at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)

at groovysh_evaluate.run(groovysh_evaluate:3)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:70)

at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:191)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)

at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)

at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)

at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:169)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:478)

Caused by: java.io.IOException: MapReduce JobID job_local311804379_0001 terminated abnormally: state=FAILED, failureinfo=NA

at org.janusgraph.hadoop.scan.HadoopScanRunner.runJob(HadoopScanRunner.java:147)

at org.janusgraph.hadoop.MapReduceIndexManagement.updateIndex(MapReduceIndexManagement.java:186)

at org.janusgraph.hadoop.MapReduceIndexManagement$updateIndex.call(Unknown Source)

at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)

... 42 more


Other map-reduce jobs seem to run (e.g. page rank and some of the other demos).

I can reindex with the Management API. I am assuming our graphs will get too big for that. This is a small graph with a few thousand nodes. Cassandra running locally on one machine.

Any comments or hints on getting this running would be most welcome.





Re: professional support for JanusGraph

Peter Musial <pmmu...@...>
 

Good point.  Thank you.


On Monday, June 26, 2017 at 2:28:04 PM UTC-4, Kelvin Lawrence wrote:
Hi Peter, I try not to do product ads on open source mailing lists so I'll just mention, in case others find this thread,  that there are definitely going to be announcements in this area from at least one company that I am familiar with.

Cheers
Kelvin

On Friday, June 23, 2017 at 10:30:12 AM UTC-5, Peter Musial wrote:
Hi All,

Are there companies that provide professional support for production deployments?

Regards,

Peter


Re: professional support for JanusGraph

Kelvin Lawrence <kelvin....@...>
 

Hi Peter, I try not to do product ads on open source mailing lists so I'll just mention, in case others find this thread,  that there are definitely going to be announcements in this area from at least one company that I am familiar with.

Cheers
Kelvin


On Friday, June 23, 2017 at 10:30:12 AM UTC-5, Peter Musial wrote:
Hi All,

Are there companies that provide professional support for production deployments?

Regards,

Peter


Re: bulk loading error

HadoopMarc <m.c.d...@...>
 

And this was the answer that Eliz referred to above:

Hi Eliz,

Good to hear that you make progress. I do not see this post on the gremlin users list. Would you be so kind as to post it there? I'll then add the answers below.

As to your questions:

  • id block reservation during bulkload is described in section 20.1.2 of:
    http://docs.janusgraph.org/latest/bulk-loading.html

  • Fighting GC/OOM: give gremlin console's JVM more memory in its startup script (java -Xmx command line option). Another possibility is to limit the transactions to say 100.000 vertices, so commit more often.

  • OLAP: the exception does not seem familiar to me. Maybe the JG code example refers to an older TP version.
    Therefore, it could help if you compare with the blvp example in TP (TP runs all code examples during ref doc generation!):
    http://tinkerpop.apache.org/docs/3.2.3/reference/#sparkgraphcomputer
    As far as I know the dependencies in the JG distribution are complete and do not need a TP install.
HTH,     Marc



Op maandag 26 juni 2017 15:30:03 UTC+2 schreef Ted Wilmes:

Hi Eliz,
For your first code snippet, you'll need to add in a periodic commit every X number of vertices instead of after you've loaded the whole file. That X will vary depending on your hardware, etc. but you can experiment and find what gives you the best performance. I'd suggest starting at 100 and going from there. Once you get that working, you could try loading data in parallel by spinning up multiple threads that are addV'ing and periodically committing.

For the second approach, using the TinkerPop BulkLoaderVertexProgram, you do not need to download TP separately. I think from looking at your stacktrace, you may just be missing a bit when you constructed the vertex program. Did you call create at the end of its construction like in this little snippet?

blvp = BulkLoaderVertexProgram.build().
                    bulkLoader(OneTimeBulkLoader).
                    writeGraph(writeGraphConf).create(modern)

Create takes the input graph that you're reading from as an argument.

--Ted

On Sunday, June 25, 2017 at 8:48:57 PM UTC-5, Elizabeth wrote:
Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz


Re: bulk loading error

Ted Wilmes <twi...@...>
 

Hi Eliz,
For your first code snippet, you'll need to add in a periodic commit every X number of vertices instead of after you've loaded the whole file. That X will vary depending on your hardware, etc. but you can experiment and find what gives you the best performance. I'd suggest starting at 100 and going from there. Once you get that working, you could try loading data in parallel by spinning up multiple threads that are addV'ing and periodically committing.

For the second approach, using the TinkerPop BulkLoaderVertexProgram, you do not need to download TP separately. I think from looking at your stacktrace, you may just be missing a bit when you constructed the vertex program. Did you call create at the end of its construction like in this little snippet?

blvp = BulkLoaderVertexProgram.build().
                    bulkLoader(OneTimeBulkLoader).
                    writeGraph(writeGraphConf).create(modern)

Create takes the input graph that you're reading from as an argument.

--Ted

On Sunday, June 25, 2017 at 8:48:57 PM UTC-5, Elizabeth wrote:
Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz


bulk loading error

Elizabeth <hlf...@...>
 

Hi Marc,

This is for your request for posting here:)

Thank so much! I indeed followed "the powers of ten", and made it even simpler to load -- not  to check if the vertex is already existent, I have done it beforehand. Here is the code, just readline and addVertex row by row: 

 def loadTestSchema(graph)  {
    g = graph.traversal()

    t=System.currentTimeMillis()
    new File("/home/dev/wanmeng/adjlist/vertices1000000.txt").eachLine{l-> p=l; graph.addVertex(label,"userId","uid", p);  }
    graph.tx().commit()

    u = System.currentTimeMillis()-t
    print u/1000+" seconds \n"
    g = graph.traversal()
    g.V().has('uid', 1)

}

The schema is as follows:
def defineTestSchema(graph) {
    mgmt = graph.openManagement()
    g = graph.traversal()
    // vertex labels
    userId= mgmt.makeVertexLabel("userId").make()
    // edge labels
    relatedby = mgmt.makeEdgeLabel("relatedby").make()
    // vertex and edge properties
    uid = mgmt.makePropertyKey("uid").dataType(Long.class).cardinality(Cardinality.SET).make()
    // global indices
    //mgmt.buildIndex("byuid", Vertex.class).addKey(uid).indexOnly(userId).buildCompositeIndex()
    mgmt.buildIndex("byuid", Vertex.class).addKey(uid).buildCompositeIndex()
    mgmt.commit()

    //mgmt = graph.openManagement()
    //mgmt.updateIndex(mgmt.getGraphIndex('byuid'), SchemaAction.REINDEX).get()
    //mgmt.commit()
}

configuration file is : janusgraph-hbase-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
schema.default=none
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

index.search.elasticsearch.interface=TRANSPORT_CLIENT
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

However, the loading time is still very long.

100     0.026s
10k    49.001seconds 
100k  35.827 seconds
1million 379.05 seconds.
10 million: error 
gremlin> loadTestSchema(graph)
15:59:27 WARN  org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority  - Temporary storage exception while acquiring id block - retrying in PT0.6S: org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [2880001, 2960001) in PT2.213S => too slow, threshold is: PT0.3S
GC overhead limit exceeded
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.OutOfMemoryError: GC overhead limit exceeded

What i am wondering is
1) that why does bulk-loading seem not working, though I have already set storage.batch-loading=true, what else should I set to make bulk-loading take effect?  do I need to drop the index in order to speed up bulk loading?
2) how to solve the GC overhead limit exceeding?

3) At the same time, I am using the Kryo+ BulkLoaderVertexProgram to load 
the last step failed:

gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
No signature of method: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.program() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder) values: [org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram$Builder@6bb4cc0e]
Possible solutions: program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram), profile(java.util.concurrent.Callable)

Do I need to install tinkerPop 3 besides Janusgraph to use this graph.compute(SparkGraphComputer).program(blvp).submit().get()?

Many thanks!

Eliz


Re: professional support for JanusGraph

Lynn Bender <ly...@...>
 

On Fri, Jun 23, 2017 at 8:30 AM, Peter Musial <pmmu...@...> wrote:
Hi All,

Are there companies that provide professional support for production deployments?

Regards,

Peter

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


professional support for JanusGraph

Peter Musial <pmmu...@...>
 

Hi All,

Are there companies that provide professional support for production deployments?

Regards,

Peter


Re: Cassandra/HBase storage backend issues

Robert Dale <rob...@...>
 

Jason, thanks for that! I learned something new.  And for those using the latest 0.2-SNAPSHOT, here's the solr 6.6 guide - https://lucene.apache.org/solr/guide/6_6/near-real-time-searching.html#near-real-time-searching

Robert Dale

On Fri, Jun 23, 2017 at 1:45 AM, Jason Plurad <plu...@...> wrote:
Hi Mike,

One thing you should watch out for is making sure that your transaction handling is clean. Check out the TinkerPop docs on Graph Transactions, especially the 3rd paragraph. It helps to do a graph.tx().rollback() before running your queries, and then making sure you commit or rollback when you're done in a try/finally block.

Do those traversal use a mixed index? Keep in mind that there is a refresh interval in ES (Solr has something similar), so if you're querying immediately after inserting the data, changes might not be visible yet.

-- Jason

On Monday, June 19, 2017 at 11:09:33 AM UTC-4, HadoopMarc wrote:
Hi Mike,

Seeing no expert answers uptil now, I can only provide a general reply. I see the following lines of thinking in explaining your situation:
  • HBase fails in providing row based consistency: extremely unlikely given the many applications that rely on this
  • JanusGraph fails in providing consistency between instances (e.g. using out of date caches). Do you use multiple JanusGraph instances? Or multiple threads that access the same JanusGraph instance?
  • Your application fails in handling exceptions in the right way (e.g. ignoring them)
  • Your application has logic faults: not so likely because you have been debugging for some while.
If you want to proceed on this, could you provide the code you use on github? So, others can confirm the behavior and/or inspect configs. Ideally, you would provide your code in the form of a:
 https://github.com/JanusGraph/janusgraph/blob/236dd930a7af35061e393ea8bb1ee6eb65f924b2/janusgraph-hbase-parent/janusgraph-hbase-core/src/test/java/org/janusgraph/graphdb/hbase/HBasePartitionGraphTest.java

Other ideas still welcome!

Marc

Op zondag 18 juni 2017 08:38:02 UTC+2 schreef mi...@...:
Hi! I'm running into an issue and wondering if anyone has tips. I'm using HBase (also tried this with cassandra with the same issue) and running into an issue where preprocessing our data yields inconsistent results. We run through a query and for each vertex with a given property, we run a traversal on it and calculate properties or insert edges that weren't inserted on upload to boost performance of our eventual traversal.

Our tests run perfectly with a tinkergraph, but when using HBase or Cassandra backend, sometimes the tests fail, sometimes the calculated properties are completely wrong, and sometimes edges aren't created when needed. A preprocess task may depend on the output of a previous preprocess task that may have taken place seconds earlier. I think this is caused by eventual consistency breaking the traversal, but I'm not sure how to get 100% accuracy (where the current preprocess task can be 100% confident it gets the correct value from a previous preprocessing task). 

I create a transaction for each preprocessing operation, then commit it once successful, but this doesn't seem to fix the issues. Any ideas?

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can i keep the vertex which i want to add is unique?

huu...@...
 

Thank you Jason,  and  we have many of duplicated vertex,  How can I do  with janusgraph more effectively?  We query a vertex and judge the existence then  decide  what  next to do, that's very low effective especially when we need to import amount of vertex and edge.

在 2017年6月23日星期五 UTC+8上午11:47:34,Jason Plurad写道:

Check out the documentation for creating a unique composite index.

Here's an example Gremlin Console session which creates a unique composite index on name. When you attempt to set a non-unique name, it will throw a SchemaViolationException.

gremlin> graph = JanusGraphFactory.open('inmemory')
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@2eadc9f6
gremlin
> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin
> nameIndex = mgmt.buildIndex('nameIndex', Vertex.class).addKey(name).unique().buildCompositeIndex()
==>nameIndex
gremlin
> mgmt.commit()
==>null
gremlin
> graph.addVertex('name', 'huupon')
==>v[4184]
gremlin
> graph.addVertex('name', 'huupon')
Adding this property for key [name] and value [huupon] violates a uniqueness constraint [nameIndex]


-- Jason

On Wednesday, June 21, 2017 at 4:34:22 AM UTC-4, huupon wrote:
Hi, all:

       How can i keep the vertex which i want to add is unique?  get and add?  whether have any other methods to add unique vertex ?


Re: Cassandra/HBase storage backend issues

Jason Plurad <plu...@...>
 

Hi Mike,

One thing you should watch out for is making sure that your transaction handling is clean. Check out the TinkerPop docs on Graph Transactions, especially the 3rd paragraph. It helps to do a graph.tx().rollback() before running your queries, and then making sure you commit or rollback when you're done in a try/finally block.

Do those traversal use a mixed index? Keep in mind that there is a refresh interval in ES (Solr has something similar), so if you're querying immediately after inserting the data, changes might not be visible yet.

-- Jason


On Monday, June 19, 2017 at 11:09:33 AM UTC-4, HadoopMarc wrote:
Hi Mike,

Seeing no expert answers uptil now, I can only provide a general reply. I see the following lines of thinking in explaining your situation:
  • HBase fails in providing row based consistency: extremely unlikely given the many applications that rely on this
  • JanusGraph fails in providing consistency between instances (e.g. using out of date caches). Do you use multiple JanusGraph instances? Or multiple threads that access the same JanusGraph instance?
  • Your application fails in handling exceptions in the right way (e.g. ignoring them)
  • Your application has logic faults: not so likely because you have been debugging for some while.
If you want to proceed on this, could you provide the code you use on github? So, others can confirm the behavior and/or inspect configs. Ideally, you would provide your code in the form of a:
 https://github.com/JanusGraph/janusgraph/blob/236dd930a7af35061e393ea8bb1ee6eb65f924b2/janusgraph-hbase-parent/janusgraph-hbase-core/src/test/java/org/janusgraph/graphdb/hbase/HBasePartitionGraphTest.java

Other ideas still welcome!

Marc

Op zondag 18 juni 2017 08:38:02 UTC+2 schreef mi...@...:
Hi! I'm running into an issue and wondering if anyone has tips. I'm using HBase (also tried this with cassandra with the same issue) and running into an issue where preprocessing our data yields inconsistent results. We run through a query and for each vertex with a given property, we run a traversal on it and calculate properties or insert edges that weren't inserted on upload to boost performance of our eventual traversal.

Our tests run perfectly with a tinkergraph, but when using HBase or Cassandra backend, sometimes the tests fail, sometimes the calculated properties are completely wrong, and sometimes edges aren't created when needed. A preprocess task may depend on the output of a previous preprocess task that may have taken place seconds earlier. I think this is caused by eventual consistency breaking the traversal, but I'm not sure how to get 100% accuracy (where the current preprocess task can be 100% confident it gets the correct value from a previous preprocessing task). 

I create a transaction for each preprocessing operation, then commit it once successful, but this doesn't seem to fix the issues. Any ideas?

Thanks,
Mike


Re: Streaming graph data

Jason Plurad <plu...@...>
 

The Gephi integration with TinkerPop was done as a Gremlin Console plugin, so it's not cleanly separated out for use from a standalone Java program. Ultimately, it looks like it only uses a couple files, so maybe it wouldn't be too hard to do.

* GephiRemoteAcceptor.groovy
* GephiTraversalVisualizationStrategy.groovy


-- Jason


On Friday, June 16, 2017 at 10:36:19 AM UTC-4, JZ wrote:
Hello,

Does anyone know if there is a way to stream a graph to a viewer such as Gephi from Janus Java client.  When using the gremlin console you can use the  tinkerpop.gephi plugin and redirect a graph Gephi.  Is there a way to do that from a Java  that has created a graph?  I did not find any mention of this in the documentation. 

Thanks

JGZ


Re: MixedIndex naming convention

Jason Plurad <plu...@...>
 

I think all of the answers are already in the docs (see the Note box in the ES Configuration Overview and Index Creation Options). If there are specific ways you think the docs could be improved, it would be good if you opened up an issue and even better if you submitted a pull request.

In your graph configuration, you actually define the indexes and the shards. You can define more than one index by using a different name for [X]. If you don't set the number of shards, it will default to 5 as dictated by Elasticsearch. For example:

# creating an index named "search". your graph mgmt code would use buildMixedIndex("search")
index
.search.backend=elasticsearch
index
.search.hostname=127.0.0.1
index.search.elasticsearch.client-only=true
# default index name is janusgraph if not explicitly configured. this is how ES refers to it.

# default number of shards is 5

# creating an index named "ravi". your graph mgmt code would use buildMixedIndex("ravi")
index
.ravi.backend=elasticsearch
index
.ravi.hostname=127.0.0.1
index.ravi.elasticsearch.client-only=true
# it's a good idea to set the index-name to the same name

index
.ravi.index-name=ravi
# overriding the default number of shards
index.ravi.elasticsearch.create.ext.index.number_of_shards=8

After you initialize your graph, you can verify that 2 indexes are created in ES, with different # shards in this example:

$ curl http://127.0.0.1:9200/_cat/indices?v
health status index      pri rep docs
.count docs.deleted store.size pri.store.size
yellow open   janusgraph  
5   1          0            0       345b           345b
yellow open   ravi        
8   1          0            0       920b           920b


-- Jason


On Tuesday, June 20, 2017 at 8:31:56 AM UTC-4, Ravikumar Govindarajan wrote:
I saw in many places of documentation/tutorials, that the mixed indexes have this

mgmt.buildIndex("vertices", Vertex.class).addKey(key).buildMixedIndex(INDEX_NAME); // With INDEX_NAME ='search', mostly


Will this create one index in ElasticSearch/SOLR for one property or all properties are clubbed under a single index? 


Also does JanusGraph partition these text based indexes, in case they get too large?


--

Ravi




Re: When janusgraph can support ES 5.x in the future?

Jason Plurad <plu...@...>
 

The next release will have support for ES 5.x. The code is already integrated on the master branch.

-- Jason


On Tuesday, June 20, 2017 at 11:18:47 PM UTC-4, huupon wrote:
Hi, all 

        We want to use janusgraph in our production, but we use hbase 1.2.x and ES 5.3.0 in our system, so I want to know when janusgraph can support ES 5.x in the future ,next version  0.2.0 ?


Re: How can i keep the vertex which i want to add is unique?

Jason Plurad <plu...@...>
 

Check out the documentation for creating a unique composite index.

Here's an example Gremlin Console session which creates a unique composite index on name. When you attempt to set a non-unique name, it will throw a SchemaViolationException.

gremlin> graph = JanusGraphFactory.open('inmemory')
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@2eadc9f6
gremlin
> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin
> nameIndex = mgmt.buildIndex('nameIndex', Vertex.class).addKey(name).unique().buildCompositeIndex()
==>nameIndex
gremlin
> mgmt.commit()
==>null
gremlin
> graph.addVertex('name', 'huupon')
==>v[4184]
gremlin
> graph.addVertex('name', 'huupon')
Adding this property for key [name] and value [huupon] violates a uniqueness constraint [nameIndex]


-- Jason


On Wednesday, June 21, 2017 at 4:34:22 AM UTC-4, huupon wrote:
Hi, all:

       How can i keep the vertex which i want to add is unique?  get and add?  whether have any other methods to add unique vertex ?


Re: Disabling Indexing Backend

Jason Plurad <plu...@...>
 

Hi Chris,

You likely initialized the default graph previously (Cassandra keyspace named "janusgraph"), and it was initialized with C* + ES.

Set a configuration property for storage.cassandra.keyspace using a non-default keyspace name, otherwise your graph configuration above would connect to the existing default graph.

-- Jason


On Wednesday, June 21, 2017 at 5:26:28 PM UTC-4, Chris Ruppelt wrote:
When initializing the JanusGraph, is there a way to disable indexing backend? The below code always assumes using elasticsearch.

       Configuration c = new BaseConfiguration();

        

        c.setProperty("gremlin.graph", "org.janusgraph.core.JanusGraphFactory");

        c.setProperty("storage.backend", "cassandrathrift");


        Graph graph = GraphFactory.open(c);


Thanks
Chris R


Re: how to load a CSV file into janusgraph

HadoopMarc <m.c.d...@...>
 

Hi Elizabeth,

OK, another resource I dug up by searching for CSV on the gremlin user list:

http://www.datastax.com/dev/blog/powers-of-ten-part-i

Translation to JanusGraph should be straightforward.

HTH,   Marc

Op woensdag 21 juni 2017 11:15:51 UTC+2 schreef Elizabeth:

Hi Marc,

Thanks so much for your information, however, I was wondering is there any complete code example about how to use 
"bulk-loading" in Janusgraph without Hadoop?


Thanks again!
Elis

On Thursday, June 15, 2017 at 9:59:10 PM UTC+8, HadoopMarc wrote:
Hi Elizabeth,

For JanusGraph you should also take into account the TinkerPop documentation. A relevant pointer for you is:
https://groups.google.com/forum/#!searchin/gremlin-users/csv%7Csort:relevance/gremlin-users/AetuGcLiBxo/KW966WAyAQAJ

Cheers,    Marc

Op woensdag 14 juni 2017 18:44:16 UTC+2 schreef Elizabeth:
Hi all,

I am new to Janusgraph, I have dived into docs of Janusgraph for almost two weeks, nothing found.
I could only gather the scatted information and most of the time it will prompt some errors.
Could anyone supply a complete example of bulk loading or loading a CSV file into Janusgraph, please?
Any little help is appreated!

Best regards,

Elis.


Re: creating a vertex with a LIST property in a single gremlin statement

Robert Dale <rob...@...>
 

It is supported syntax. It's part of the TinkerPop API.  0.1.0 and 0.1.1 both have the same version of TinkerPop Gremlin.  


Robert Dale

On Tue, Jun 20, 2017 at 2:21 PM, Peter Musial <pmmu...@...> wrote:
Hi All,

(first entry, so please be patient)

Following is more of a gremlin question.  I have a JG schema with a property called status, cardinality list.

status = mgmt.makePropertyKey('status').dataType(String.class).cardinality(Cardinality.LIST).make();

In documentation it has been suggested to do this:

myVertex = graph.addVertex(label,'myVertex')
myVertex.property('status', 'HI')
myVertex.property('status', 'BYE')

which of course it works as advertised.  However, I found that the following shorthand will also work in JanusGraph 0.1.1 (but not in 0.1.0)

graph.addVertex(label,'myVertex', 'status', 'HI', 'status', 'BYE')

Can someone help explain if this is a supported syntax, or simply some syntactic sugar.  Also, I cannot find any documentation on the key differences between JanusGraph 0.1.0 and 0.1.1 with respect to gremlin.

$ echo $CASSANDRA_HOME
/path/janusgraph/apache-cassandra-2.1.17
$ which janusgraph.sh
/path/janusgraph/janusgraph-0.1.1-hadoop2/bin/janusgraph.sh
$ which gremlin.sh
/path/janusgraph/janusgraph-0.1.1-hadoop2/bin/gremlin.sh

Thank you,

Peter

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

6301 - 6320 of 6666