Re: Reindex job
Joe Obernberger <joseph.o...@...>
Hi Jerry - HBase appears to be healthy. The graph is about 65 million nodes and 74 million edges. -Joe On 6/1/2017 1:05 AM, Jerry He wrote:
|
|
Re: Reindex job
Jerry He <jerr...@...>
How big is your graph? Is your HBase healthy? I wonder what the default timeout duration is. It is like a JanusGraph timeout, not a HBase timeout. Thanks. Jerry
On Wed, May 31, 2017 at 2:36 PM, Joe Obernberger <joseph.o...@...> wrote: Hi All - I have a graph that I'd like to add an index to. I've tried it through gremlin with a smaller graph and the procedure works fine. With a larger graph, however, I get a timeout error:
|
|
Re: Janus Graph performing OLAP with Spark/Yarn
sju...@...
I think there are many success stories/snippets out there on this but no consolidated how-to that I'm aware of. Marc I'm pretty sure I've seen plenty of examples from you on this across various lists over the years. I can contribute a couple examples as well if we can get some documentation started on this under JanusGraph. I've had success getting traversals and vertex programs working using Titan SparkGraphComputer with HBase using both TinkerPop-3.0.1/Spark-1.2 (Yarn/Cloudera) and TinkerPop-3.2.3/Spark-1.6 (Yarn/Cloudera and Mesos). But haven't tested this out with JanusGraph yet. Personally I'd recommend you consider running Spark on Mesos instead of Yarn if possible. The configuration is easier in my opinion and you can have apps running against different versions of Spark, making hardware and software updates much easier and less disruptive. A few notes in case helpful: First is probably obvious but I always match the server Spark version exactly to the Spark version in TinkerPop from the relevant JanusGraph/Titan distribution. Also I've found the spark.executor.extraClassPath property to be crucial to getting things working both with Yarn and Mesos. Jars included there will be at the start of the classpath, which is important when the cluster may have conflicting versions of core/transitive dependencies. I'll usually create a single jar with all dependencies (excluding Spark), put it somewhere accessible on all cluster nodes and then define spark.executor.extraClassPath pointing to same.
|
|
Reindex job
Joe Obernberger <joseph.o...@...>
Hi All - I have a graph that I'd like to add an index to. I've tried it through gremlin with a smaller graph and the procedure works fine. With a larger graph, however, I get a timeout error:
gremlin> mgmt.updateIndex(mgmt.getGraphIndex("fullNameIndex"),SchemaAction.REINDEX).get() 17:18:34 ERROR org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor - Exception occured during job execution: {} org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:150) at java.lang.Thread.run(Thread.java:745) 17:18:34 ERROR org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor - Processing thread interrupted while waiting on queue or processing data java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:272) 17:18:34 ERROR org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor - Could not load data from storage: {} java.lang.RuntimeException: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:97) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore$RowIterator.hasNext(HBaseKeyColumnValueStore.java:295) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:325) Caused by: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:188) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:410) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:371) at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94) ... 5 more 17:18:34 ERROR org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor - Could not load data from storage: {} java.lang.RuntimeException: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:97) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore$RowIterator.hasNext(HBaseKeyColumnValueStore.java:295) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:325) Caused by: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:214) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:410) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:371) at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94) ... 5 more 17:18:34 ERROR org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor - Could not load data from storage: {} java.lang.RuntimeException: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:97) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore$RowIterator.hasNext(HBaseKeyColumnValueStore.java:295) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$DataPuller.run(StandardScannerExecutor.java:325) Caused by: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:214) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:410) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:371) at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94) ... 5 more org.janusgraph.diskstorage.TemporaryBackendException: Timed out waiting for next row data - storage error likely Type ':help' or ':h' for help. Display stack trace? [yN]n Any ideas? I've also tried this through Java code, but I get a different error: 17/05/31 17:14:37 INFO job.IndexRepairJob: Found index fullNameIndex 17/05/31 17:14:37 ERROR scan.StandardScannerExecutor: Exception trying to setup the job: java.lang.IllegalStateException: Operation cannot be executed because the enclosing transaction is closed at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.verifyOpen(StandardJanusGraphTx.java:299) at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getRelationType(StandardJanusGraphTx.java:891) at org.janusgraph.graphdb.query.QueryUtil.getType(QueryUtil.java:61) at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.constructQueryWithoutProfile(BasicVertexCentricQueryBuilder.java:456) at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.constructQuery(BasicVertexCentricQueryBuilder.java:399) at org.janusgraph.graphdb.olap.QueryContainer$QueryBuilder.relations(QueryContainer.java:129) at org.janusgraph.graphdb.olap.QueryContainer$QueryBuilder.edges(QueryContainer.java:165) at org.janusgraph.graphdb.olap.job.IndexRepairJob.getQueries(IndexRepairJob.java:216) at org.janusgraph.graphdb.olap.VertexJobConverter.getQueries(VertexJobConverter.java:157) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:103) at java.lang.Thread.run(Thread.java:745) 17/05/31 17:14:37 ERROR job.IndexRepairJob: Transaction commit threw runtime exception: java.lang.IllegalArgumentException: The transaction has already been closed at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1356) at org.janusgraph.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:235) at org.janusgraph.graphdb.olap.job.IndexUpdateJob.workerIterationEnd(IndexUpdateJob.java:133) at org.janusgraph.graphdb.olap.job.IndexRepairJob.workerIterationEnd(IndexRepairJob.java:76) at org.janusgraph.graphdb.olap.VertexJobConverter.workerIterationEnd(VertexJobConverter.java:118) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:125) at java.lang.Thread.run(Thread.java:745) Exception in thread "Thread-14" java.lang.IllegalArgumentException: The transaction has already been closed at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1356) at org.janusgraph.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:235) at org.janusgraph.graphdb.olap.job.IndexUpdateJob.workerIterationEnd(IndexUpdateJob.java:133) at org.janusgraph.graphdb.olap.job.IndexRepairJob.workerIterationEnd(IndexRepairJob.java:76) at org.janusgraph.graphdb.olap.VertexJobConverter.workerIterationEnd(VertexJobConverter.java:118) at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor.run(StandardScannerExecutor.java:125) at java.lang.Thread.run(Thread.java:745) 17/05/31 17:14:37 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 17/05/31 17:14:37 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x35b92203553bdf6 17/05/31 17:14:37 INFO zookeeper.ZooKeeper: Session: 0x35b92203553bdf6 closed 17/05/31 17:14:37 INFO zookeeper.ClientCnxn: EventThread shut down Thanks for any ideas! -Joe
|
|
Re: Janus Graph performing OLAP with Spark/Yarn
HadoopMarc <m.c.d...@...>
Hi John, I have plans to try this, too, so question seconded. I have TinkerPop-3.1.1 OLAP working on Spark/Yarn (Hortonworks), but the JanusGraph HBase or Cassandra dependencies will make version conflicts harder to handle. Basically, you need: - your cluster configs on your application or console classpath - solve version conflicts. So, get rid of the lower version jars where there is a minor version difference. Report to this list if clashing versions differ by a major version number. I believe the current lib folder of the JanusGraph distribution already has a few double jars with minor version differences (sorry, have not had time to report this). You will hate spark-assembly because it is not easy to remove lower versions from dependencies included in it... Spark has some config options to load user jars first, though. I still wonder if some maven guru can help us to solve this manual work by adding the entire cluster as a dependency to the JG project and get the version conflicts at build time instead of at runtime. Also, I might be mistaken in the above and simple configs would solve the question. So, the original questions still stands (has anyone ....) Cheers, Marc Op woensdag 31 mei 2017 19:36:01 UTC+2 schreef Joseph Obernberger:
|
|
Re: Janus Graph performing OLAP with Spark/Yarn
Joe Obernberger <joseph.o...@...>
Hi John - I'm also very interested in how to do this. We
recently built a graph stored in HBase, and when we run
g.E().count(), it took some 5+ hours to complete from the gremlin
shell (79 million edges). Is there any 'how to' or getting
started guide on how to use Spark+YARN with this? Thank you! -Joe On 5/31/2017 1:06 PM, 'John Helmsen'
via JanusGraph users list wrote:
|
|
Re: JanusGraph configuration for scala
Misha Brukman <mbru...@...>
Filed https://github.com/JanusGraph/janusgraph/issues/295 to track this.
On Wed, May 31, 2017 at 1:08 PM, 'John Helmsen' via JanusGraph users list <janusgra...@...> wrote:
|
|
Re: JanusGraph configuration for scala
John Helmsen <john....@...>
May I second that? It would be helpful to have all of the versions in one place, even if they are repeating TinkerPop.
On Wednesday, May 31, 2017 at 11:42:15 AM UTC-4, Misha Brukman wrote:
|
|
Janus Graph performing OLAP with Spark/Yarn
John Helmsen <john....@...>
Gentlemen and Ladies, Currently our group is trying to stand up an instance of JanusGraph/Titan that performs OLAP operations using SparkGraphComputer in TinkerPop. To do OLAP,.we wish to use Spark with Yarn. So far, however, we have not been able to successfully launch any distributed queries, such as count(), using this approach. While we can post stack traces, etc, I'd like to ask a different question first. Has anyone gotten the system to perform Spark operations using YARN? If so, how?
|
|
Re: JanusGraph configuration for scala
Misha Brukman <mbru...@...>
Would it make sense to add Scala and Spark versions to the compatibility chart for clarity, even though they're tied to TinkerPop version?
On Wed, May 31, 2017 at 11:15 AM, Jason Plurad <plu...@...> wrote:
|
|
Re: JanusGraph configuration for scala
Jason Plurad <plu...@...>
I'll note that those product versions are a bit ahead of what is listed in the version compatibility matrix. Also, as discussed in a previous thread, the supported Scala version is 2.10.5.
On Tuesday, May 30, 2017 at 8:18:20 AM UTC-4, Prabin acharya wrote:
|
|
JanusGraph configuration for scala
Prabin acharya <pra5...@...>
I am trying to configure JanusGraph with Apache Cassandra 3.1 and Solr 6.1.0 . I cannot connect to the Storage backend from the code , though all of my configuration are correct ..
|
|
Re: Who is using JanusGraph in production?
Liu-Cheng Xu <xuliuc...@...>
toggle quoted messageShow quoted text
-- Liu-Cheng Xu
|
|
Re: Who is using JanusGraph in production?
Misha Brukman <mbru...@...>
Hi Jimmy, I started building a list of companies using JanusGraph in production; you can see the current list here: https://github.com/JanusGraph/janusgraph#users (and the logos at the bottom of http://janusgraph.org) and more additions are on the way. They appear to be happy with JanusGraph, but I'll let them chime in if they want to provide any additional details. BTW, if anyone else is a production user of JanusGraph, please get in touch with me and let's get you added on the list as well! Misha
On Fri, Apr 7, 2017 at 4:03 AM, Jimmy <xuliuc...@...> wrote:
|
|
Re: Production users of JanusGraph
Misha Brukman <mbru...@...>
Hi Anurag, I started a list of companies using JanusGraph in production; you can see the current list here: https://github.com/JanusGraph/janusgraph#users (and the logos at the bottom of http://janusgraph.org) and more additions are on the way. They appear to be happy with JanusGraph, but I'll let them chime in if they want to provide any additional details. BTW, if anyone else is a production user of JanusGraph, please get in touch with me and let's get you added as well! Misha
On Wed, Apr 5, 2017 at 12:28 PM, anurag <anurag...@...> wrote:
|
|
Re: Bulk loading CPU busy
Rafael Fernandes <luizr...@...>
Can u try setting this property and compare?
query.fast-property=true Thanks Rafa
|
|
Re: Bulk loading CPU busy
CGen <new...@...>
I put Vertices in Map. This was the reason for frequent equals calls. I removed this. Now hot spots is: java.lang.Object.hashCode[native] () (6,1%) org.janusgraph.graphdb.internal.AbstractElement.equals (Object) (5,3%) java.security.AccessController.doPrivileged[native] (java.security.PrivilegedExceptionAction) (4,9%) вторник, 23 мая 2017 г., 10:41:15 UTC+3 пользователь CGen написал:
|
|
Bulk loading CPU busy
CGen <new...@...>
Hello! I need to load CSV files into JanusGraph with a size of 1.7 gigabytes. How can I increase the load speed? I added to the config: storage.batch-loading=true schema.default=none I have almost no access to the disk, but the CPU is always busy. In the profiler, the hot spot is: org.janusgraph.graphdb.internal.AbstractElement equals (8%).
|
|
Re: Query requires iterating over all vertices
CGen <new...@...>
Thanks to all. It works as it should. пятница, 19 мая 2017 г., 21:44:46 UTC+3 пользователь Jason Plurad написал:
|
|
Re: Query requires iterating over all vertices
Jason Plurad <plu...@...>
Ah, you used indexOnly() to restrict the index by vertex label, so in order to utilize the index, your query must include the vertex label. Either of these would work since ultimately they are optimized to the same traversal:
On Friday, May 19, 2017 at 1:57:25 PM UTC-4, CGen wrote:
|
|