|
|
Jason Plurad <plu...@...>
+1 thanks for sharing Marc!
toggle quoted messageShow quoted text
On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:
|
|
John Helmsen <john....@...>
toggle quoted messageShow quoted text
On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:
|
|
John Helmsen <john....@...>
HadoopMarc,It seems that we have two graph classes that need to be created:The first is a standardjanusgraph object that runs a standard computer. This is able to perform OLTP data pushes and, I assume, standard OLTP queries. It, however, does not interface with Spark, so SparkGraphComputer cannot be used as the graph computer for its traversal object.The second object is a HadoopGraph object that can have SparkGraphComputer activated for its associated traversal source object. This can perform appropriate map-reduced OLAP calculations, but wouldn't be good for putting information into the HBase database.Is this accurate, or can we create a graphtraversalsource that can perform both the OLTP data inserts and utilize SparkGraphComputer? If not, could we create both objects simultaneously? Would there be conflicts between the two if there were two simultaneous traversers?
toggle quoted messageShow quoted text
On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:
|
|
Hi John, Your assumption about different types of graph object for
OLTP and OLAP is right (at least for JanusGraph, TinkerGraph supports both). I remember examples from the gremlin user list, though, where OLTP and OLAP were mixed in the same traversal. It is no problem to have a graph1 and a graph2 graph object
simultaneously. This is also what you do in gremlin-server when you want
to serve multiple graphs. Cheers, Marc Op zaterdag 15 juli 2017 00:24:25 UTC+2 schreef John Helmsen:
toggle quoted messageShow quoted text
HadoopMarc,It seems that we have two graph classes that need to be created:The first is a standardjanusgraph object that runs a standard computer. This is able to perform OLTP data pushes and, I assume, standard OLTP queries. It, however, does not interface with Spark, so SparkGraphComputer cannot be used as the graph computer for its traversal object.The second object is a HadoopGraph object that can have SparkGraphComputer activated for its associated traversal source object. This can perform appropriate map-reduced OLAP calculations, but wouldn't be good for putting information into the HBase database.Is this accurate, or can we create a graphtraversalsource that can perform both the OLTP data inserts and utilize SparkGraphComputer? If not, could we create both objects simultaneously? Would there be conflicts between the two if there were two simultaneous traversers?
On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:
|
|
hi,Thanks for your post. I did it according to the post.But I ran into a problem. 15:58:49,110 INFO SecurityManager:58 - Changing view acls to: rc 15:58:49,110 INFO SecurityManager:58 - Changing modify acls to: rc 15:58:49,110 INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rc); users with modify permissions: Set(rc) 15:58:49,111 INFO Client:58 - Submitting application 25 to ResourceManager 15:58:49,320 INFO YarnClientImpl:274 - Submitted application application_1500608983535_0025 15:58:49,321 INFO SchedulerExtensionServices:58 - Starting Yarn extension services with app application_1500608983535_0025 and attemptId None 15:58:50,325 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:50,326 INFO Client:58 - client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1500883129115 final status: UNDEFINED tracking URL: http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025/ user: rc 15:58:51,330 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:52,333 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:53,335 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:54,337 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:55,340 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:56,802 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null) 15:58:56,822 INFO YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> dl-rc-optd-ambari-master-v-test-1.host.dataengine.com,dl-rc-optd-ambari-master-v-test-2.host.dataengine.com, PROXY_URI_BASES -> http://dl-rc-optd-ambari-master-v-test-1.host.dataengine.com:8088/proxy/application_1500608983535_0025,http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025), /proxy/application_1500608983535_0025 15:58:56,824 INFO JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15:58:57,346 INFO Client:58 - Application report for application_1500608983535_0025 (state: RUNNING) 15:58:57,347 INFO Client:58 - client token: N/A diagnostics: N/A ApplicationMaster host: 10.200.48.154 ApplicationMaster RPC port: 0 queue: default start time: 1500883129115 final status: UNDEFINED tracking URL: http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025/ user: rc 15:58:57,348 INFO YarnClientSchedulerBackend:58 - Application application_1500608983535_0025 has started running. 15:58:57,358 INFO Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47514. 15:58:57,358 INFO NettyBlockTransferService:58 - Server created on 47514 15:58:57,360 INFO BlockManagerMaster:58 - Trying to register BlockManager 15:58:57,363 INFO BlockManagerMasterEndpoint:58 - Registering block manager 10.200.48.112:47514 with 2.4 GB RAM, BlockManagerId(driver, 10.200.48.112, 47514)15:58:57,366 INFO BlockManagerMaster:58 - Registered BlockManager
15:58:57,585 INFO EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_1500608983535_0025 15:59:07,177 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:70 - Container marked as failed: container_e170_1500608983535_0025_01_000002 on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_e170_1500608983535_0025_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:576) at org.apache.hadoop.util.Shell.run(Shell.java:487) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1 main : run as user is rc main : requested yarn user is rc
Container exited with a non-zero exit code 1
Display stack trace? [yN]15:59:57,702 WARN TransportChannelHandler:79 - Exception in connection from 10.200.48.155/10.200.48.155:50921 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:748) 15:59:57,704 ERROR TransportResponseHandler:132 - Still have 1 requests outstanding when connection from 10.200.48.155/10.200.48.155:50921 is closed 15:59:57,706 WARN NettyRpcEndpointRef:91 - Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
toggle quoted messageShow quoted text
|
|
Joe Obernberger <joseph.o...@...>
Marc - thank you for posting this. I'm trying to get this to
work with our CDH 5.10.0 distribution, but have run into an issue;
but first some questions. I'm using a 5 node cluster, and I think
I do not need to set the zookeeper.zone.parent since the hbase
configuration is in /etc/conf/hbase. Is that correct?
The error that I'm getting is:
org.apache.spark.SparkException: Job aborted due to stage
failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
Lost task 1.3 in stage 0.0 (TID 10, host002, executor 1):
java.lang.ClassCastException: cannot assign instance of
java.lang.invoke.SerializedLambda to field
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330
of type org.apache.spark.api.java.function.PairFunction in
instance of
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2238)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2112)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2123)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Given this post:
https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser
It looks like I'm not including a necessary jar, but I'm at a
loss as to which one. Any ideas?
For reference, here is part of the config:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
#janusgraphmr.ioformat.conf.storage.hostname=fqdn1,fqdn2,fqdn3
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=TEST0.2.0
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=18
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
#zookeeper.znode.parent=/hbase-unsecure
# Security configs are needed in case of a secure cluster
#zookeeper.znode.parent=/hbase-secure
#hbase.rpc.protection=privacy
#hbase.security.authentication=kerberos
#
# SparkGraphComputer with Yarn Configuration
#
spark.master=yarn-client
spark.executor.memory=512m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*:/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
Thank you!
-Joe
On 7/6/2017 4:15 AM, HadoopMarc wrote:
toggle quoted messageShow quoted text
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Joe Obernberger <joseph.o...@...>
Could this be a networking issue? Maybe a firewall is enabled,
or selinux is preventing a connection?
I've been able to get this to work, but running a simple count -
g.V().count() on anything but a very small graph takes a very very
long time (hours). Are there any cache settings, or other
resources that could be modified to better the performance?
The YARN container logs are filled withe debug lines about
'Created dirty vertex map with initial size 32', 'Created vertex
cache with max size 20000', and 'Generated HBase Filter
ColumnRange Filter'. Can any of these things be adjusted in the
properties file? Thank you!
-Joe
toggle quoted messageShow quoted text
hi,Thanks for your post.
I did it according to the post.But I ran into a problem.
15:58:49,110 INFO
SecurityManager:58 - Changing view acls to: rc
15:58:49,110 INFO
SecurityManager:58 - Changing modify acls to: rc
15:58:49,110 INFO
SecurityManager:58 - SecurityManager: authentication
disabled; ui acls disabled; users with view
permissions: Set(rc); users with modify permissions:
Set(rc)
15:58:49,111 INFO
Client:58 - Submitting application 25 to
ResourceManager
15:58:49,320 INFO
YarnClientImpl:274 - Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting Yarn
extension services with app
application_1500608983535_0025 and attemptId None
15:58:50,325 INFO
Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:50,326 INFO
Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:51,330 INFO
Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:52,333 INFO
Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:53,335 INFO
Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:54,337 INFO
Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:55,340 INFO
Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343 INFO Client:58
- Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:56,802 INFO
YarnSchedulerBackend$YarnSchedulerEndpoint:58 -
ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO
JettyUtils:58 - Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58
- Application report for
application_1500608983535_0025 (state: RUNNING)
15:58:57,347 INFO Client:58
-
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 - Application
application_1500608983535_0025 has started running.
15:58:57,358 INFO Utils:58
- Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO
NettyBlockTransferService:58 - Server created on 47514
15:58:57,360 INFO
BlockManagerMaster:58 - Trying to register
BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 - Registering block
manager 10.200.48.112:47514 with 2.4 GB RAM,
BlockManagerId(driver, 10.200.48.112,
47514)15:58:57,366 INFO BlockManagerMaster:58 -
Registered BlockManager
15:58:57,585 INFO
EventLoggingListener:58 - Logging events to
hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN
YarnSchedulerBackend$YarnSchedulerEndpoint:70 -
Container marked as failed:
container_e170_1500608983535_0025_01_000002 on host:
dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception from
container-launch.
Container id:
container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace:
ExitCodeException exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command
provided 1
main : run as user is rc
main : requested yarn user
is rc
Container exited with a
non-zero exit code 1
Display stack trace?
[yN]15:59:57,702 WARN TransportChannelHandler:79 -
Exception in connection from
10.200.48.155/10.200.48.155:50921
java.io.IOException:
Connection reset by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
15:59:57,704 ERROR
TransportResponseHandler:132 - Still have 1 requests
outstanding when connection from
10.200.48.155/10.200.48.155:50921 is closed
15:59:57,706 WARN
NettyRpcEndpointRef:91 - Error sending message
[message = RequestExecutors(0,0,Map())] in 1
attempts
java.io.IOException:
Connection reset by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Hi ... and others, I have been offline for a few weeks enjoying a holiday and will start looking into your questions and make the suggested corrections. Thanks for following the recipes and helping others with it.
..., did you run the recipe on the same HDP sandbox and same Tinkerpop version? I remember (from 4 weeks ago) that copying the zookeeper.znode.parent property from the hbase configs to the janusgraph configs was essential to get janusgraph's HBaseInputFormat working (that is: read graph data for the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef spi...@...:
toggle quoted messageShow quoted text
hi,Thanks for your post. I did it according to the post.But I ran into a problem. 15:58:49,110 INFO SecurityManager:58 - Changing view acls to: rc 15:58:49,110 INFO SecurityManager:58 - Changing modify acls to: rc 15:58:49,110 INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rc); users with modify permissions: Set(rc) 15:58:49,111 INFO Client:58 - Submitting application 25 to ResourceManager 15:58:49,320 INFO YarnClientImpl:274 - Submitted application application_1500608983535_0025 15:58:49,321 INFO SchedulerExtensionServices:58 - Starting Yarn extension services with app application_1500608983535_0025 and attemptId None 15:58:50,325 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:50,326 INFO Client:58 - client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1500883129115 final status: UNDEFINED user: rc 15:58:51,330 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:52,333 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:53,335 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:54,337 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:55,340 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED) 15:58:56,802 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null) 15:58:56,824 INFO JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15:58:57,346 INFO Client:58 - Application report for application_1500608983535_0025 (state: RUNNING) 15:58:57,347 INFO Client:58 - client token: N/A diagnostics: N/A ApplicationMaster host: 10.200.48.154 ApplicationMaster RPC port: 0 queue: default start time: 1500883129115 final status: UNDEFINED user: rc 15:58:57,348 INFO YarnClientSchedulerBackend:58 - Application application_1500608983535_0025 has started running. 15:58:57,358 INFO Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47514. 15:58:57,358 INFO NettyBlockTransferService:58 - Server created on 47514 15:58:57,360 INFO BlockManagerMaster:58 - Trying to register BlockManager 15:58:57,363 INFO BlockManagerMasterEndpoint:58 - Registering block manager 10.200.48.112:47514 with 2.4 GB RAM, BlockManagerId(driver, 10.200.48.112, 47514)15:58:57,366 INFO BlockManagerMaster:58 - Registered BlockManager 15:58:57,585 INFO EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_1500608983535_0025 15:59:07,177 WARN YarnSchedulerBackend$ YarnSchedulerEndpoint:70 - Container marked as failed: container_e170_1500608983535_ 0025_01_000002 on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_e170_1500608983535_0025_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:576) at org.apache.hadoop.util.Shell.run(Shell.java:487) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1 main : run as user is rc main : requested yarn user is rc
Container exited with a non-zero exit code 1
java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread. java:748) 15:59:57,706 WARN NettyRpcEndpointRef:91 - Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
|
|
Joe Obernberger <joseph.o...@...>
Hi Marc - I've been able to get it to run longer, but am now
getting a RowTooBigException from HBase. How does JanusGraph
store data in HBase? The current max size of a row in 1GByte,
which makes me think this error is covering something else up.
What I'm seeing so far in testing with a 5 server cluster - each
machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and has
20,001,105 rows. To do a g.V().count() takes 2 hours and results
in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10 regions, is
57,620,276 rows, and took 6.5 hours to run the count and results
in 10,859,491 nodes. When running, it looks like it hits one
server very hard even though the YARN tasks are distributed across
the cluster. One HBase node gets hammered.
The RowTooBigException is below. Anything to try? Thank you for
any help!
org.janusgraph.core.JanusGraphException: Could not process
individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at
org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException: Could not call
index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException: Could not
execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at
com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to repeated
temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017,
RpcRetryingCaller{globalStartTime=1501932111280, pause=100,
retries=35},
org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException: Max row
size allowed: 1073741824, but the row is bigger than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
On 8/6/2017 3:50 PM, HadoopMarc wrote:
toggle quoted messageShow quoted text
Hi ... and others, I have been offline for a few
weeks enjoying a holiday and will start looking into your
questions and make the suggested corrections. Thanks for
following the recipes and helping others with it.
..., did you run the recipe on the same HDP sandbox and same
Tinkerpop version? I remember (from 4 weeks ago) that copying
the zookeeper.znode.parent property
from the hbase configs to the janusgraph configs was essential
to get janusgraph's HBaseInputFormat working (that is: read
graph data for the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef
spi...@...:
hi,Thanks for your post.
I did it according to the post.But I ran into a
problem.
15:58:49,110 INFO SecurityManager:58 -
Changing view acls to: rc
15:58:49,110 INFO SecurityManager:58 -
Changing modify acls to: rc
15:58:49,110 INFO SecurityManager:58 -
SecurityManager: authentication disabled; ui
acls disabled; users with view permissions:
Set(rc); users with modify permissions: Set(rc)
15:58:49,111 INFO Client:58 - Submitting
application 25 to ResourceManager
15:58:49,320 INFO YarnClientImpl:274 -
Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting Yarn
extension services with app
application_1500608983535_0025 and attemptId
None
15:58:50,325 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:50,326 INFO Client:58 -
client
token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue:
default
start
time: 1500883129115
final
status: UNDEFINED
user:
rc
15:58:51,330 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:52,333 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:53,335 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:54,337 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:55,340
INFO Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343 INFO Client:58 - Application
report for application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,802 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:58
- ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO JettyUtils:58 - Adding
filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58 - Application
report for application_1500608983535_0025 (state:
RUNNING)
15:58:57,347 INFO Client:58 -
client
token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue:
default
start
time: 1500883129115
final
status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 - Application
application_1500608983535_0025 has started
running.
15:58:57,358 INFO Utils:58 - Successfully
started service 'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO NettyBlockTransferService:58
- Server created on 47514
15:58:57,360 INFO BlockManagerMaster:58 -
Trying to register BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 - Registering block
manager 10.200.48.112:47514
with 2.4 GB RAM, BlockManagerId(driver,
10.200.48.112, 47514)15:58:57,366 INFO
BlockManagerMaster:58 - Registered BlockManager
15:58:57,585 INFO EventLoggingListener:58 -
Logging events to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN YarnSchedulerBackend$ YarnSchedulerEndpoint:70
- Container marked as failed:
container_e170_1500608983535_ 0025_01_000002
on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception from
container-launch.
Container id: container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc
Container exited with a non-zero exit code 1
java.io.IOException: Connection reset by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread. java:748)
15:59:57,706 WARN NettyRpcEndpointRef:91 -
Error sending message [message =
RequestExecutors(0,0,Map())] in 1 attempts
java.io.IOException: Connection reset by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Hi Joseph, You ran into terrain I have not yet covered myself. Up till now I have been using the graben1437 PR for Titan and for OLAP I adopted a poor man's approach where node id's are distributed over spark tasks and each spark executor makes its own Titan/HBase connection. This performs well, but does not have the nice abstraction of the HBaseInputFormat. So, no clear answer to this one, but just some thoughts: - could you try to move some regions manually and see what it does to performance? - how do your OLAP vertex count times compare to the OLTP count times? - how does the sum of spark task execution times compare to the yarn
start-to-end time difference you reported? In other words, how much of
the start-to-end time is spent in waiting for timeouts? - unless you managed to create a vertex with > 1GB size, the RowTooBigException sounds like a bug (which you can report on Jnausgraph's github page). Hbase does not like large rows at all, so vertex/edge properties should not have blob values. @(David Robinson): do you have any additional thoughts on this? Cheers, Marc Op maandag 7 augustus 2017 23:12:02 UTC+2 schreef Joseph Obernberger:
toggle quoted messageShow quoted text
Hi Marc - I've been able to get it to run longer, but am now
getting a RowTooBigException from HBase. How does JanusGraph
store data in HBase? The current max size of a row in 1GByte,
which makes me think this error is covering something else up.
What I'm seeing so far in testing with a 5 server cluster - each
machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and has
20,001,105 rows. To do a g.V().count() takes 2 hours and results
in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10 regions, is
57,620,276 rows, and took 6.5 hours to run the count and results
in 10,859,491 nodes. When running, it looks like it hits one
server very hard even though the YARN tasks are distributed across
the cluster. One HBase node gets hammered.
The RowTooBigException is below. Anything to try? Thank you for
any help!
org.janusgraph.core.JanusGraphException: Could not process
individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at
org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException: Could not call
index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException: Could not
execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at
com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to repeated
temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017,
RpcRetryingCaller{globalStartTime=1501932111280, pause=100,
retries=35},
org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException: Max row
size allowed: 1073741824, but the row is bigger than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
On 8/6/2017 3:50 PM, HadoopMarc wrote:
Hi ... and others, I have been offline for a few
weeks enjoying a holiday and will start looking into your
questions and make the suggested corrections. Thanks for
following the recipes and helping others with it.
..., did you run the recipe on the same HDP sandbox and same
Tinkerpop version? I remember (from 4 weeks ago) that copying
the zookeeper.znode.parent property
from the hbase configs to the janusgraph configs was essential
to get janusgraph's HBaseInputFormat working (that is: read
graph data for the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef
spi...@...:
hi,Thanks for your post.
I did it according to the post.But I ran into a
problem.
15:58:49,110 INFO SecurityManager:58 -
Changing view acls to: rc
15:58:49,110 INFO SecurityManager:58 -
Changing modify acls to: rc
15:58:49,110 INFO SecurityManager:58 -
SecurityManager: authentication disabled; ui
acls disabled; users with view permissions:
Set(rc); users with modify permissions: Set(rc)
15:58:49,111 INFO Client:58 - Submitting
application 25 to ResourceManager
15:58:49,320 INFO YarnClientImpl:274 -
Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting Yarn
extension services with app
application_1500608983535_0025 and attemptId
None
15:58:50,325 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:50,326 INFO Client:58 -
client
token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue:
default
start
time: 1500883129115
final
status: UNDEFINED
user:
rc
15:58:51,330 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:52,333 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:53,335 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:54,337 INFO Client:58 - Application
report for application_1500608983535_0025
(state: ACCEPTED)
15:58:55,340
INFO Client:58 - Application report for
application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343 INFO Client:58 - Application
report for application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,802 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:58
- ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO JettyUtils:58 - Adding
filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58 - Application
report for application_1500608983535_0025 (state:
RUNNING)
15:58:57,347 INFO Client:58 -
client
token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue:
default
start
time: 1500883129115
final
status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 - Application
application_1500608983535_0025 has started
running.
15:58:57,358 INFO Utils:58 - Successfully
started service 'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO NettyBlockTransferService:58
- Server created on 47514
15:58:57,360 INFO BlockManagerMaster:58 -
Trying to register BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 - Registering block
manager 10.200.48.112:47514
with 2.4 GB RAM, BlockManagerId(driver,
10.200.48.112, 47514)15:58:57,366 INFO
BlockManagerMaster:58 - Registered BlockManager
15:58:57,585 INFO EventLoggingListener:58 -
Logging events to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN YarnSchedulerBackend$ YarnSchedulerEndpoint:70
- Container marked as failed:
container_e170_1500608983535_ 0025_01_000002
on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception from
container-launch.
Container id: container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc
Container exited with a non-zero exit code 1
java.io.IOException: Connection reset by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread. java:748)
15:59:57,706 WARN NettyRpcEndpointRef:91 -
Error sending message [message =
RequestExecutors(0,0,Map())] in 1 attempts
java.io.IOException: Connection reset by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
Joe Obernberger <joseph.o...@...>
Hi Marc - thank you very much for your reply. I like your idea
about moving regions manually and will try that. As to OLAP vs
OLTP (I assume Spark vs none), yes I have those times.
For a 1.5G table in HBase the count just using the gremlin shell
without using the SparkGraphComputer:
graph = JanusGraphFactory.open('conf/graph.properties')
g=graph.traversal()
g.V().count()
takes just under 1 minute. Using spark it takes about 2 hours.
So something isn't right. They both return 3,842,755 vertices.
When I run it with Spark, it hits one of the region servers hard -
doing over 30k requests per second for those 2 hours.
-Joe
On 8/8/2017 3:17 AM, HadoopMarc wrote:
toggle quoted messageShow quoted text
Hi Joseph,
You ran into terrain I have not yet covered myself. Up till now
I have been using the graben1437 PR for Titan and for
OLAP I adopted a poor man's approach where node id's are
distributed over spark tasks and each spark executor makes its
own Titan/HBase connection. This performs well, but does not
have the nice abstraction of the HBaseInputFormat.
So, no clear answer to this one, but just some thoughts:
- could you try to move some regions manually and see
what it does to performance?
- how do your OLAP vertex count times compare to the OLTP count
times?
- how does the sum of spark task execution times compare to the
yarn start-to-end time difference you reported? In other words,
how much of the start-to-end time is spent in waiting for
timeouts?
- unless you managed to create a vertex with > 1GB size, the
RowTooBigException sounds like a bug (which you can report on
Jnausgraph's github page). Hbase does not like
large rows at all, so vertex/edge properties should not have
blob values.
@(David Robinson): do you have any additional thoughts on this?
Cheers, Marc
Op maandag 7 augustus 2017 23:12:02 UTC+2 schreef Joseph
Obernberger:
Hi Marc - I've been able to get it to run longer, but am
now getting a RowTooBigException from HBase. How does
JanusGraph store data in HBase? The current max size of a
row in 1GByte, which makes me think this error is covering
something else up.
What I'm seeing so far in testing with a 5 server cluster
- each machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and
has 20,001,105 rows. To do a g.V().count() takes 2 hours
and results in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10
regions, is 57,620,276 rows, and took 6.5 hours to run the
count and results in 10,859,491 nodes. When running, it
looks like it hits one server very hard even though the
YARN tasks are distributed across the cluster. One HBase
node gets hammered.
The RowTooBigException is below. Anything to try? Thank
you for any help!
org.janusgraph.core.JanusGraphException: Could not
process individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException:
Could not call index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException:
Could not execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to
repeated temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017, RpcRetryingCaller{globalStartTime=1501932111280,
pause=100, retries=35}, org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException:
Max row size allowed: 1073741824, but the row is bigger
than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
On 8/6/2017 3:50 PM, HadoopMarc wrote:
Hi ... and others, I have been offline for
a few weeks enjoying a holiday and will start looking
into your questions and make the suggested corrections.
Thanks for following the recipes and helping others with
it.
..., did you run the recipe on the same HDP sandbox and
same Tinkerpop version? I remember (from 4 weeks ago)
that copying the zookeeper.znode.parent
property from the hbase configs to the janusgraph
configs was essential to get janusgraph's
HBaseInputFormat working (that is: read graph data for
the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef spi...@...:
hi,Thanks for your post.
I did it according to the post.But I ran into a
problem.
15:58:49,110 INFO SecurityManager:58
- Changing view acls to: rc
15:58:49,110 INFO SecurityManager:58
- Changing modify acls to: rc
15:58:49,110 INFO SecurityManager:58
- SecurityManager: authentication
disabled; ui acls disabled; users with
view permissions: Set(rc); users with
modify permissions: Set(rc)
15:58:49,111 INFO Client:58 -
Submitting application 25 to
ResourceManager
15:58:49,320 INFO YarnClientImpl:274
- Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting
Yarn extension services with app
application_1500608983535_0025 and
attemptId None
15:58:50,325 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:50,326 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:51,330 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:52,333 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:53,335 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:54,337 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:55,340
INFO Client:58 - Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,343 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,802 INFO
YarnSchedulerBackend$YarnSchedulerEndpoint:58
- ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO JettyUtils:58 -
Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
RUNNING)
15:58:57,347 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 -
Application application_1500608983535_0025
has started running.
15:58:57,358 INFO Utils:58 -
Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO
NettyBlockTransferService:58 - Server
created on 47514
15:58:57,360 INFO
BlockManagerMaster:58 - Trying to register
BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 -
Registering block manager 10.200.48.112:47514
with 2.4 GB RAM, BlockManagerId(driver,
10.200.48.112, 47514)15:58:57,366 INFO
BlockManagerMaster:58 - Registered
BlockManager
15:58:57,585 INFO
EventLoggingListener:58 - Logging events
to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN
YarnSchedulerBackend$ YarnSchedulerEndpoint:70
- Container marked as failed:
container_e170_1500608983535_ 0025_01_000002
on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception
from container-launch.
Container id:
container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException
exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc
Container exited with a non-zero exit
code 1
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread. java:748)
15:59:57,706 WARN
NettyRpcEndpointRef:91 - Error sending
message [message =
RequestExecutors(0,0,Map())] in 1
attempts
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help
me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to
the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Hi Marc,
Request your help, I am runnign Janusgraph with maprdb as backend, I have successfuly been able to create GodofGraphs example on M7 as backend
But when I am trying to execute the following where cluster is mapr and spark on yarn
plugin activated: tinkerpop.tinkergraph gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties') ==>hadoopgraph[gryoinputformat->nulloutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V().count()
hadoop-load.properties ( I tried all different combinations as commented below each time its the same error)
# # SparkGraphComputer Configuration # spark.master=yarn-client
spark.yarn.queue=cmp mapred.job.queue.name=cmp
#spark.driver.allowMultipleContexts=true #spark.executor.memory=4g #spark.ui.port=20000
spark.serializer=org.apache.spark.serializer.KryoSerializer spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
#spark.executor.instances=10 #spark.executor.cores=2 #spark.executor.CoarseGrainedExecutorBackend.cores=2 #spark.executor.CoarseGrainedExecutorBackend.driver=FIXME #spark.executor.CoarseGrainedExecutorBackend.stopping=false #spark.streaming.stopGracefullyOnShutdown=true #spark.yarn.driver.memoryOverhead=4g #spark.yarn.executor.memoryOverhead=1024 #spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557
-------------------------------------------------------------------------------------------------------------- yarn log ------------------------------------------------------------------------------------------------------------ When the last command is executed, driver abruptly shuts down in yarn container and shuts down the spark context too with following error from Yarn logs
Container: container_e27_1501284102300_47651_01_000008 on abcd.com_8039LogType:stderr Log Upload Time:Mon Jul 31 14:08:42 -0700 2017 LogLength:2441 Log Contents: 17/07/31 14:08:05 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/07/31 14:08:05 INFO spark.SecurityManager: Changing view acls to: cmphs 17/07/31 14:08:05 INFO spark.SecurityManager: Changing modify acls to: cmphs 17/07/31 14:08:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs) 17/07/31 14:08:06 INFO spark.SecurityManager: Changing view acls to: cmphs 17/07/31 14:08:06 INFO spark.SecurityManager: Changing modify acls to: cmphs 17/07/31 14:08:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs) 17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/07/31 14:08:06 INFO Remoting: Starting remoting 17/07/31 14:08:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@...:36376] 17/07/31 14:08:06 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 36376. 17/07/31 14:08:06 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2 17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore started with capacity 2.7 GB 17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@...:43768 17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver 17/07/31 14:08:06 INFO executor.Executor: Starting executor ID 7 on host abcd.com 17/07/31 14:08:06 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44208. 17/07/31 14:08:06 INFO netty.NettyBlockTransferService: Server created on 44208 17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to register BlockManager 17/07/31 14:08:06 INFO storage.BlockManagerMaster: Registered BlockManager 17/07/31 14:08:41 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown 17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore cleared 17/07/31 14:08:41 INFO storage.BlockManager: BlockManager stopped 17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Mon Jul 31 14:08:42 -0700 2017 LogLength:0 Log Contents: End of LogType:stdout
I am stuck wit h this error , request help here.
|
|
Joe Obernberger <joseph.o...@...>
Could you let us know a little more about your configuration?
What is your storage backend for JanusGraph (HBase/Cassandra)? I
actually do not see an error in your log, but at the very least
you'll need to defein spark.executor.extraClassPath to point to
the various jars required. Are there other logs you can look at
such as the container logs for YARN?
I assume you've see Marc's blog post:
http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html
-Joe
On 8/8/2017 3:45 PM, Gariee wrote:
toggle quoted messageShow quoted text
Hi Marc,
Request your help, I am runnign Janusgraph with maprdb as
backend, I have successfuly been able to create GodofGraphs
example on M7 as backend
But when I am trying to
execute the following where cluster is mapr and spark on
yarn
plugin
activated: tinkerpop.tinkergraph
gremlin>
graph =
GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
==>hadoopgraph[gryoinputformat->nulloutputformat]
gremlin>
g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->nulloutputformat],
sparkgraphcomputer]
gremlin>
g.V().count()
hadoop-load.properties ( I
tried all different combinations as commented below each
time its the same error)
#
# SparkGraphComputer
Configuration
#
spark.master=yarn-client
spark.yarn.queue=cmp
mapred.job.queue.name=cmp
#spark.driver.allowMultipleContexts=true
#spark.executor.memory=4g
#spark.ui.port=20000
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
#spark.executor.instances=10
#spark.executor.cores=2
#spark.executor.CoarseGrainedExecutorBackend.cores=2
#spark.executor.CoarseGrainedExecutorBackend.driver=FIXME
#spark.executor.CoarseGrainedExecutorBackend.stopping=false
#spark.streaming.stopGracefullyOnShutdown=true
#spark.yarn.driver.memoryOverhead=4g
#spark.yarn.executor.memoryOverhead=1024
#spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557
--------------------------------------------------------------------------------------------------------------
yarn log
------------------------------------------------------------------------------------------------------------
When the last command is
executed, driver abruptly shuts down in yarn container and
shuts down the spark context too with following error from
Yarn logs
Container:
container_e27_1501284102300_47651_01_000008 on abcd.com_8039
LogType:stderr
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:2441
Log Contents:
17/07/31 14:08:05 INFO
executor.CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
17/07/31 14:08:05 INFO spark.SecurityManager: Changing view
acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager: Changing
modify acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager:
SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(cmphs); users with modify
permissions: Set(cmphs)
17/07/31 14:08:06 INFO spark.SecurityManager: Changing view
acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager: Changing
modify acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager:
SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(cmphs); users with modify
permissions: Set(cmphs)
17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger
started
17/07/31 14:08:06 INFO Remoting: Starting remoting
17/07/31 14:08:06 INFO Remoting: Remoting started; listening
on addresses
:[akka.tcp://sparkExecutorActorSystem@...:36376]
17/07/31 14:08:06 INFO util.Utils: Successfully started
service 'sparkExecutorActorSystem' on port 36376.
17/07/31 14:08:06 INFO storage.DiskBlockManager: Created
local directory at
/tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2
17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore
started with capacity 2.7 GB
17/07/31 14:08:06 INFO
executor.CoarseGrainedExecutorBackend: Connecting to driver:
spark://CoarseGrainedScheduler@...:43768
17/07/31 14:08:06 INFO
executor.CoarseGrainedExecutorBackend: Successfully
registered with driver
17/07/31 14:08:06 INFO executor.Executor: Starting executor
ID 7 on host abcd.com
17/07/31 14:08:06 INFO util.Utils: Successfully started
service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 44208.
17/07/31 14:08:06 INFO netty.NettyBlockTransferService:
Server created on 44208
17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to
register BlockManager
17/07/31 14:08:06 INFO storage.BlockManagerMaster:
Registered BlockManager
17/07/31 14:08:41 INFO
executor.CoarseGrainedExecutorBackend: Driver commanded a
shutdown
17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore
cleared
17/07/31 14:08:41 INFO storage.BlockManager: BlockManager
stopped
17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown
hook called
End of LogType:stderr
LogType:stdout
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:0
Log Contents:
End of LogType:stdout
I am stuck wit h this
error , request help here.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Joe Obernberger <joseph.o...@...>
Hi Marc - I did try splitting regions. What happens when I run
the SparkGraphComputer job, is that it seems to hit one region
server hard, then moves onto the next; appears to run serially.
All that said, I think the big issue is that running with Spark
for a count takes 2 hours and running without takes 2 minutes.
Perhaps the SparkGraphComputer is not the tool to be used for
handling large graphs? Although, I'm not sure what the purpose is
then?
For reference, here is my Hadoop configuration file using
Cloudera CDH 5.10.0 and JanusGraph 0.1.0:
Config:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=host003:2181,host004:2181,host005:2181
janusgraphmr.ioformat.conf.storage.hbase.table=Large2
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
# Security configs are needed in case of a secure cluster
zookeeper.znode.parent=/hbase
#
# SparkGraphComputer with Yarn Configuration
#
#spark.master=yarn-client
spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar,/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000
spark.cores.max=64
#
# Relevant configs from spark-defaults.conf
#
spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../conf:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/..:\
/opt/cloudera/parcels/CDH/lib/spark/assembly/lib/spark-assembly.jar:\
/usr/lib/jvm/java-openjdk/jre/lib/rt.jar:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hadoop/conf:\
/etc/hbase/conf:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/.//*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/.//*
spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thanks very much for any input / further dialog!
-Joe
On 8/8/2017 3:17 AM, HadoopMarc wrote:
toggle quoted messageShow quoted text
Hi Joseph,
You ran into terrain I have not yet covered myself. Up till now
I have been using the graben1437 PR for Titan and for
OLAP I adopted a poor man's approach where node id's are
distributed over spark tasks and each spark executor makes its
own Titan/HBase connection. This performs well, but does not
have the nice abstraction of the HBaseInputFormat.
So, no clear answer to this one, but just some thoughts:
- could you try to move some regions manually and see
what it does to performance?
- how do your OLAP vertex count times compare to the OLTP count
times?
- how does the sum of spark task execution times compare to the
yarn start-to-end time difference you reported? In other words,
how much of the start-to-end time is spent in waiting for
timeouts?
- unless you managed to create a vertex with > 1GB size, the
RowTooBigException sounds like a bug (which you can report on
Jnausgraph's github page). Hbase does not like
large rows at all, so vertex/edge properties should not have
blob values.
@(David Robinson): do you have any additional thoughts on this?
Cheers, Marc
Op maandag 7 augustus 2017 23:12:02 UTC+2 schreef Joseph
Obernberger:
Hi Marc - I've been able to get it to run longer, but am
now getting a RowTooBigException from HBase. How does
JanusGraph store data in HBase? The current max size of a
row in 1GByte, which makes me think this error is covering
something else up.
What I'm seeing so far in testing with a 5 server cluster
- each machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and
has 20,001,105 rows. To do a g.V().count() takes 2 hours
and results in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10
regions, is 57,620,276 rows, and took 6.5 hours to run the
count and results in 10,859,491 nodes. When running, it
looks like it hits one server very hard even though the
YARN tasks are distributed across the cluster. One HBase
node gets hammered.
The RowTooBigException is below. Anything to try? Thank
you for any help!
org.janusgraph.core.JanusGraphException: Could not
process individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException:
Could not call index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException:
Could not execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to
repeated temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017, RpcRetryingCaller{globalStartTime=1501932111280,
pause=100, retries=35}, org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException:
Max row size allowed: 1073741824, but the row is bigger
than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
On 8/6/2017 3:50 PM, HadoopMarc wrote:
Hi ... and others, I have been offline for
a few weeks enjoying a holiday and will start looking
into your questions and make the suggested corrections.
Thanks for following the recipes and helping others with
it.
..., did you run the recipe on the same HDP sandbox and
same Tinkerpop version? I remember (from 4 weeks ago)
that copying the zookeeper.znode.parent
property from the hbase configs to the janusgraph
configs was essential to get janusgraph's
HBaseInputFormat working (that is: read graph data for
the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef spi...@...:
hi,Thanks for your post.
I did it according to the post.But I ran into a
problem.
15:58:49,110 INFO SecurityManager:58
- Changing view acls to: rc
15:58:49,110 INFO SecurityManager:58
- Changing modify acls to: rc
15:58:49,110 INFO SecurityManager:58
- SecurityManager: authentication
disabled; ui acls disabled; users with
view permissions: Set(rc); users with
modify permissions: Set(rc)
15:58:49,111 INFO Client:58 -
Submitting application 25 to
ResourceManager
15:58:49,320 INFO YarnClientImpl:274
- Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting
Yarn extension services with app
application_1500608983535_0025 and
attemptId None
15:58:50,325 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:50,326 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:51,330 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:52,333 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:53,335 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:54,337 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:55,340
INFO Client:58 - Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,343 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,802 INFO
YarnSchedulerBackend$YarnSchedulerEndpoint:58
- ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO JettyUtils:58 -
Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
RUNNING)
15:58:57,347 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 -
Application application_1500608983535_0025
has started running.
15:58:57,358 INFO Utils:58 -
Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO
NettyBlockTransferService:58 - Server
created on 47514
15:58:57,360 INFO
BlockManagerMaster:58 - Trying to register
BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 -
Registering block manager 10.200.48.112:47514
with 2.4 GB RAM, BlockManagerId(driver,
10.200.48.112, 47514)15:58:57,366 INFO
BlockManagerMaster:58 - Registered
BlockManager
15:58:57,585 INFO
EventLoggingListener:58 - Logging events
to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN
YarnSchedulerBackend$ YarnSchedulerEndpoint:70
- Container marked as failed:
container_e170_1500608983535_ 0025_01_000002
on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception
from container-launch.
Container id:
container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException
exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc
Container exited with a non-zero exit
code 1
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread. java:748)
15:59:57,706 WARN
NettyRpcEndpointRef:91 - Error sending
message [message =
RequestExecutors(0,0,Map())] in 1
attempts
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help
me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to
the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Hi Gari and Joe, Glad to see you testing the recipes for MapR and Cloudera respectively! I am sure that you realized by now that getting this to work is like walking through a minefield. If you deviate from the known path, the odds for getting through are dim, and no one wants to be in your vicinity. So, if you see a need to deviate (which there may be for the hadoop distributions you use), you will need your mine sweeper, that is, put the logging level to DEBUG for relevant java packages. This is where you deviated: - for Gari: you put all kinds of MapR lib folders on the applications master's classpath (other classpath configs are not visible from your post)
- for Joe: you put all kinds of Cloudera lib folders on the executors classpath (worst of all the spark-assembly.jar)
Probably, you experience all kinds of mismatches in netty libraries which slows down or even kills all comms between the yarn containers. The philosophy of the recipes really is to only add the minimum number of conf folders and jars to the Tinkerpop/Janusgraph distribution and see from there if any libraries are missing.
At my side, it has become apparent that I should at least add to the recipes: - proof of work for a medium-sized graph (say 10M vertices and edges)
- configs for the number of executors present in the OLAP job (instead of relying on spark default number of 2)
So, still some work to do!
Cheers, Marc
Op woensdag 9 augustus 2017 19:16:23 UTC+2 schreef Joseph Obernberger:
toggle quoted messageShow quoted text
Hi Marc - I did try splitting regions. What happens when I run
the SparkGraphComputer job, is that it seems to hit one region
server hard, then moves onto the next; appears to run serially.
All that said, I think the big issue is that running with Spark
for a count takes 2 hours and running without takes 2 minutes.
Perhaps the SparkGraphComputer is not the tool to be used for
handling large graphs? Although, I'm not sure what the purpose is
then?
For reference, here is my Hadoop configuration file using
Cloudera CDH 5.10.0 and JanusGraph 0.1.0:
Config:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=host003:2181,host004:2181,host005:2181
janusgraphmr.ioformat.conf.storage.hbase.table=Large2
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
# Security configs are needed in case of a secure cluster
zookeeper.znode.parent=/hbase
#
# SparkGraphComputer with Yarn Configuration
#
#spark.master=yarn-client
spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar,/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000
spark.cores.max=64
#
# Relevant configs from spark-defaults.conf
#
spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../conf:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/..:\
/opt/cloudera/parcels/CDH/lib/spark/assembly/lib/spark-assembly.jar:\
/usr/lib/jvm/java-openjdk/jre/lib/rt.jar:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hadoop/conf:\
/etc/hbase/conf:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/.//*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/.//*
spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thanks very much for any input / further dialog!
-Joe
On 8/8/2017 3:17 AM, HadoopMarc wrote:
Hi Joseph,
You ran into terrain I have not yet covered myself. Up till now
I have been using the graben1437 PR for Titan and for
OLAP I adopted a poor man's approach where node id's are
distributed over spark tasks and each spark executor makes its
own Titan/HBase connection. This performs well, but does not
have the nice abstraction of the HBaseInputFormat.
So, no clear answer to this one, but just some thoughts:
- could you try to move some regions manually and see
what it does to performance?
- how do your OLAP vertex count times compare to the OLTP count
times?
- how does the sum of spark task execution times compare to the
yarn start-to-end time difference you reported? In other words,
how much of the start-to-end time is spent in waiting for
timeouts?
- unless you managed to create a vertex with > 1GB size, the
RowTooBigException sounds like a bug (which you can report on
Jnausgraph's github page). Hbase does not like
large rows at all, so vertex/edge properties should not have
blob values.
@(David Robinson): do you have any additional thoughts on this?
Cheers, Marc
Op maandag 7 augustus 2017 23:12:02 UTC+2 schreef Joseph
Obernberger:
Hi Marc - I've been able to get it to run longer, but am
now getting a RowTooBigException from HBase. How does
JanusGraph store data in HBase? The current max size of a
row in 1GByte, which makes me think this error is covering
something else up.
What I'm seeing so far in testing with a 5 server cluster
- each machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and
has 20,001,105 rows. To do a g.V().count() takes 2 hours
and results in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10
regions, is 57,620,276 rows, and took 6.5 hours to run the
count and results in 10,859,491 nodes. When running, it
looks like it hits one server very hard even though the
YARN tasks are distributed across the cluster. One HBase
node gets hammered.
The RowTooBigException is below. Anything to try? Thank
you for any help!
org.janusgraph.core.JanusGraphException: Could not
process individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException:
Could not call index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException:
Could not execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to
repeated temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017, RpcRetryingCaller{globalStartTime=1501932111280,
pause=100, retries=35}, org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException:
Max row size allowed: 1073741824, but the row is bigger
than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
On 8/6/2017 3:50 PM, HadoopMarc wrote:
Hi ... and others, I have been offline for
a few weeks enjoying a holiday and will start looking
into your questions and make the suggested corrections.
Thanks for following the recipes and helping others with
it.
..., did you run the recipe on the same HDP sandbox and
same Tinkerpop version? I remember (from 4 weeks ago)
that copying the zookeeper.znode.parent
property from the hbase configs to the janusgraph
configs was essential to get janusgraph's
HBaseInputFormat working (that is: read graph data for
the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef spi...@...:
hi,Thanks for your post.
I did it according to the post.But I ran into a
problem.
15:58:49,110 INFO SecurityManager:58
- Changing view acls to: rc
15:58:49,110 INFO SecurityManager:58
- Changing modify acls to: rc
15:58:49,110 INFO SecurityManager:58
- SecurityManager: authentication
disabled; ui acls disabled; users with
view permissions: Set(rc); users with
modify permissions: Set(rc)
15:58:49,111 INFO Client:58 -
Submitting application 25 to
ResourceManager
15:58:49,320 INFO YarnClientImpl:274
- Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting
Yarn extension services with app
application_1500608983535_0025 and
attemptId None
15:58:50,325 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:50,326 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:51,330 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:52,333 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:53,335 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:54,337 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:55,340
INFO Client:58 - Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,343 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,802 INFO
YarnSchedulerBackend$YarnSchedulerEndpoint:58
- ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO JettyUtils:58 -
Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
RUNNING)
15:58:57,347 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 -
Application application_1500608983535_0025
has started running.
15:58:57,358 INFO Utils:58 -
Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO
NettyBlockTransferService:58 - Server
created on 47514
15:58:57,360 INFO
BlockManagerMaster:58 - Trying to register
BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 -
Registering block manager 10.200.48.112:47514
with 2.4 GB RAM, BlockManagerId(driver,
10.200.48.112, 47514)15:58:57,366 INFO
BlockManagerMaster:58 - Registered
BlockManager
15:58:57,585 INFO
EventLoggingListener:58 - Logging events
to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN
YarnSchedulerBackend$ YarnSchedulerEndpoint:70
- Container marked as failed:
container_e170_1500608983535_ 0025_01_000002
on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception
from container-launch.
Container id:
container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException
exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc
Container exited with a non-zero exit
code 1
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread. java:748)
15:59:57,706 WARN
NettyRpcEndpointRef:91 - Error sending
message [message =
RequestExecutors(0,0,Map())] in 1
attempts
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help
me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to
the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
Joe Obernberger <joseph.o...@...>
Marc - thank you. I've updated the classpath and removed nearly
all of the CDH jars; had to keep chimera and some of the HBase
libs in there. Apart from those and all the jars in lib.zip, it
is working as it did before. The reason I turned DEBUG off was
because it was producing 100+GBytes of logs. Nearly all of which
are things like:
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore -
Generated HBase Filter ColumnRangeFilter [\x10\xC0, \x10\xC1)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava
vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created dirty vertex map with initial size 32
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created vertex cache with max size 20000
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore -
Generated HBase Filter ColumnRangeFilter [\x10\xC2, \x10\xC3)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava
vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created dirty vertex map with initial size 32
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created vertex cache with max size 20000
Do those mean anything to you? I've turned it back on for running
with smaller graph sizes, but so far I don't see anything helpful
there apart from an exception about not setting HADOOP_HOME.
Here are the spark properties; notice the nice and small
extraClassPath! :)
Name
|
Value
|
gremlin.graph
|
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
|
gremlin.hadoop.deriveMemory
|
false
|
gremlin.hadoop.graphReader
|
org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
|
gremlin.hadoop.graphWriter
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.graphWriter.hasEdges
|
false
|
gremlin.hadoop.inputLocation
|
none
|
gremlin.hadoop.jarsInDistributedCache
|
true
|
gremlin.hadoop.memoryOutputFormat
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.outputLocation
|
output
|
janusgraphmr.ioformat.conf.storage.backend
|
hbase
|
janusgraphmr.ioformat.conf.storage.hbase.region-count
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names
|
false
|
janusgraphmr.ioformat.conf.storage.hbase.table
|
TEST0.2.0
|
janusgraphmr.ioformat.conf.storage.hostname
|
10.22.5.65:2181
|
log4j.appender.STDOUT
|
org.apache.log4j.ConsoleAppender
|
log4j.logger.deng
|
WARNING
|
log4j.rootLogger
|
STDOUT
|
org.slf4j.simpleLogger.defaultLogLevel
|
warn
|
spark.akka.frameSize
|
1024
|
spark.app.id
|
application_1502118729859_0041
|
spark.app.name
|
Apache TinkerPop's Spark-Gremlin
|
spark.authenticate
|
false
|
spark.cores.max
|
64
|
spark.driver.appUIAddress
|
http://10.22.5.61:4040
|
spark.driver.extraJavaOptons
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
|
spark.driver.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.driver.host
|
10.22.5.61
|
spark.driver.port
|
38529
|
spark.dynamicAllocation.enabled
|
true
|
spark.dynamicAllocation.executorIdleTimeout
|
60
|
spark.dynamicAllocation.minExecutors
|
0
|
spark.dynamicAllocation.schedulerBacklogTimeout
|
1
|
spark.eventLog.dir
|
hdfs://host001:8020/user/spark/applicationHistory
|
spark.eventLog.enabled
|
true
|
spark.executor.extraClassPath
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf:
|
spark.executor.extraJavaOptions
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
|
spark.executor.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.executor.heartbeatInterval
|
100000
|
spark.executor.id
|
driver
|
spark.executor.memory
|
10240m
|
spark.externalBlockStore.folderName
|
spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db
|
spark.kyroserializer.buffer.max
|
1600m
|
spark.master
|
yarn-client
|
spark.network.timeout
|
90000
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS
|
host005
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES
|
http://host005:8088/proxy/application_1502118729859_0041
|
spark.scheduler.mode
|
FIFO
|
spark.serializer
|
org.apache.spark.serializer.KryoSerializer
|
spark.shuffle.service.enabled
|
true
|
spark.shuffle.service.port
|
7337
|
spark.ui.filters
|
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
|
spark.ui.killEnabled
|
true
|
spark.yarn.am.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.yarn.appMasterEnv.CLASSPATH
|
/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
|
spark.yarn.config.gatewayPath
|
/opt/cloudera/parcels
|
spark.yarn.config.replacementPath
|
{{HADOOP_COMMON_HOME}}/../../..
|
spark.yarn.dist.archives
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
|
spark.yarn.dist.files
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
|
spark.yarn.dist.jars
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
|
spark.yarn.historyServer.address
|
http://host001:18088
|
zookeeper.znode.parent
|
/hbase
|
-Joe
On 8/9/2017 3:33 PM, HadoopMarc wrote:
toggle quoted messageShow quoted text
Hi Gari and Joe,
Glad to see you testing the recipes for MapR and Cloudera
respectively! I am sure that you realized by now that getting
this to work is like walking through a minefield. If you deviate
from the known path, the odds for getting through are dim, and
no one wants to be in your vicinity. So, if you see a need to
deviate (which there may be for the hadoop distributions you
use), you will need your mine sweeper, that is, put the logging
level to DEBUG for relevant java packages.
This is where you deviated:
- for Gari: you put all kinds of MapR lib folders on the
applications master's classpath (other classpath configs are
not visible from your post)
- for Joe: you put all kinds of Cloudera lib folders on the
executors classpath (worst of all the spark-assembly.jar)
Probably, you experience all kinds of mismatches in netty
libraries which slows down or even kills all comms between the
yarn containers. The philosophy of the recipes really is to
only add the minimum number of conf folders and jars to the
Tinkerpop/Janusgraph distribution and see from there if any
libraries are missing.
At my side, it has become apparent that I should at least add
to the recipes:
- proof of work for a medium-sized graph (say 10M vertices
and edges)
- configs for the number of executors present in the OLAP
job (instead of relying on spark default number of 2)
So, still some work to do!
Cheers, Marc
|
|
Hi Joe,
Another thing to try (only tested on Tinkerpop, not on JanusGraph): create the traversalsource as follows:
g = graph.traversal().withComputer(new Computer().graphComputer(SparkGraphComputer).workers(100))
With HadoopGraph this helps hdfs files with very large or no partitions to be split across tasks; I did not check the effect yet for HBaseInputFormat in JanusGraph. And did you add spark.executor.instances=10 (or some suitable number) to your config? And did you check in the RM ui or Spark history server whether these executors were really allocated and started?
More later,
Marc
Op donderdag 10 augustus 2017 00:13:09 UTC+2 schreef Joseph Obernberger:
toggle quoted messageShow quoted text
Marc - thank you. I've updated the classpath and removed nearly
all of the CDH jars; had to keep chimera and some of the HBase
libs in there. Apart from those and all the jars in lib.zip, it
is working as it did before. The reason I turned DEBUG off was
because it was producing 100+GBytes of logs. Nearly all of which
are things like:
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore -
Generated HBase Filter ColumnRangeFilter [\x10\xC0, \x10\xC1)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava
vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created dirty vertex map with initial size 32
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created vertex cache with max size 20000
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore -
Generated HBase Filter ColumnRangeFilter [\x10\xC2, \x10\xC3)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava
vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created dirty vertex map with initial size 32
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created vertex cache with max size 20000
Do those mean anything to you? I've turned it back on for running
with smaller graph sizes, but so far I don't see anything helpful
there apart from an exception about not setting HADOOP_HOME.
Here are the spark properties; notice the nice and small
extraClassPath! :)
Name
|
Value
|
gremlin.graph
|
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
|
gremlin.hadoop.deriveMemory
|
false
|
gremlin.hadoop.graphReader
|
org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
|
gremlin.hadoop.graphWriter
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.graphWriter.hasEdges
|
false
|
gremlin.hadoop.inputLocation
|
none
|
gremlin.hadoop.jarsInDistributedCache
|
true
|
gremlin.hadoop.memoryOutputFormat
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.outputLocation
|
output
|
janusgraphmr.ioformat.conf.storage.backend
|
hbase
|
janusgraphmr.ioformat.conf.storage.hbase.region-count
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names
|
false
|
janusgraphmr.ioformat.conf.storage.hbase.table
|
TEST0.2.0
|
janusgraphmr.ioformat.conf.storage.hostname
|
10.22.5.65:2181
|
log4j.appender.STDOUT
|
org.apache.log4j.ConsoleAppender
|
log4j.logger.deng
|
WARNING
|
log4j.rootLogger
|
STDOUT
|
org.slf4j.simpleLogger.defaultLogLevel
|
warn
|
spark.akka.frameSize
|
1024
|
spark.app.id
|
application_1502118729859_0041
|
spark.app.name
|
Apache TinkerPop's Spark-Gremlin
|
spark.authenticate
|
false
|
spark.cores.max
|
64
|
spark.driver.appUIAddress
|
http://10.22.5.61:4040
|
spark.driver.extraJavaOptons
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
|
spark.driver.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.driver.host
|
10.22.5.61
|
spark.driver.port
|
38529
|
spark.dynamicAllocation.enabled
|
true
|
spark.dynamicAllocation.executorIdleTimeout
|
60
|
spark.dynamicAllocation.minExecutors
|
0
|
spark.dynamicAllocation.schedulerBacklogTimeout
|
1
|
spark.eventLog.dir
|
hdfs://host001:8020/user/spark/applicationHistory
|
spark.eventLog.enabled
|
true
|
spark.executor.extraClassPath
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf:
|
spark.executor.extraJavaOptions
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
|
spark.executor.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.executor.heartbeatInterval
|
100000
|
spark.executor.id
|
driver
|
spark.executor.memory
|
10240m
|
spark.externalBlockStore.folderName
|
spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db
|
spark.kyroserializer.buffer.max
|
1600m
|
spark.master
|
yarn-client
|
spark.network.timeout
|
90000
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS
|
host005
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES
|
http://host005:8088/proxy/application_1502118729859_0041
|
spark.scheduler.mode
|
FIFO
|
spark.serializer
|
org.apache.spark.serializer.KryoSerializer
|
spark.shuffle.service.enabled
|
true
|
spark.shuffle.service.port
|
7337
|
spark.ui.filters
|
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
|
spark.ui.killEnabled
|
true
|
spark.yarn.am.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.yarn.appMasterEnv.CLASSPATH
|
/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
|
spark.yarn.config.gatewayPath
|
/opt/cloudera/parcels
|
spark.yarn.config.replacementPath
|
{{HADOOP_COMMON_HOME}}/../../..
|
spark.yarn.dist.archives
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
|
spark.yarn.dist.files
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
|
spark.yarn.dist.jars
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
|
spark.yarn.historyServer.address
|
http://host001:18088
|
zookeeper.znode.parent
|
/hbase
|
-Joe
On 8/9/2017 3:33 PM, HadoopMarc wrote:
Hi Gari and Joe,
Glad to see you testing the recipes for MapR and Cloudera
respectively! I am sure that you realized by now that getting
this to work is like walking through a minefield. If you deviate
from the known path, the odds for getting through are dim, and
no one wants to be in your vicinity. So, if you see a need to
deviate (which there may be for the hadoop distributions you
use), you will need your mine sweeper, that is, put the logging
level to DEBUG for relevant java packages.
This is where you deviated:
- for Gari: you put all kinds of MapR lib folders on the
applications master's classpath (other classpath configs are
not visible from your post)
- for Joe: you put all kinds of Cloudera lib folders on the
executors classpath (worst of all the spark-assembly.jar)
Probably, you experience all kinds of mismatches in netty
libraries which slows down or even kills all comms between the
yarn containers. The philosophy of the recipes really is to
only add the minimum number of conf folders and jars to the
Tinkerpop/Janusgraph distribution and see from there if any
libraries are missing.
At my side, it has become apparent that I should at least add
to the recipes:
- proof of work for a medium-sized graph (say 10M vertices
and edges)
- configs for the number of executors present in the OLAP
job (instead of relying on spark default number of 2)
So, still some work to do!
Cheers, Marc
|
|
Joe Obernberger <joseph.o...@...>
Thank you Marc.
I did not set spark.executor.instances, but I do have
spark.cores.max set to 64 and within YARN, it is configured to
allow has much RAM/cores for our 5 server cluster. When I run a
job on a table that has 61 regions, I see that 43 tasks are
started and running on all 5 nodes in the Spark UI (and running
top on each of the servers). If I lower the amount of RAM (heap)
that each tasks has (currently set to 10G), they fail with
OutOfMemory exceptions. It still hits one HBase node very hard
and cycles through them. While that may be a reason for a
performance issue, it doesn't explain the massive number of calls
that HBase receives for a count job, and why using
SparkGraphComputer takes so much more time.
Running with your command below appears to not alter the
behavior. I did run a job last night with DEBUG turned on, but it
produced too much logging filling up the log directory on 3 of the
5 nodes before stopping.
Thanks again Marc!
-Joe
On 8/10/2017 7:33 AM, HadoopMarc wrote:
toggle quoted messageShow quoted text
Hi Joe,
Another thing to try (only tested on Tinkerpop, not on
JanusGraph): create the traversalsource as follows:
g = graph.traversal().withComputer(new
Computer().graphComputer(SparkGraphComputer).workers(100))
With HadoopGraph this helps hdfs files with very large or no
partitions to be split across tasks; I did not check the effect
yet for HBaseInputFormat in JanusGraph. And did you add
spark.executor.instances=10 (or some suitable number) to your
config? And did you check in the RM ui or Spark history server
whether these executors were really allocated and started?
More later,
Marc
Op donderdag 10 augustus 2017 00:13:09 UTC+2 schreef Joseph
Obernberger:
Marc - thank you. I've updated the classpath and removed
nearly all of the CDH jars; had to keep chimera and some
of the HBase libs in there. Apart from those and all the
jars in lib.zip, it is working as it did before. The
reason I turned DEBUG off was because it was producing
100+GBytes of logs. Nearly all of which are things like:
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore
- Generated HBase Filter ColumnRangeFilter [\x10\xC0,
\x10\xC1)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx
- Guava vertex cache size: requested=20000 effective=20000
(min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created vertex cache with max size 20000
18:04:29 DEBUG org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore
- Generated HBase Filter ColumnRangeFilter [\x10\xC2,
\x10\xC3)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx
- Guava vertex cache size: requested=20000 effective=20000
(min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created vertex cache with max size 20000
Do those mean anything to you? I've turned it back on for
running with smaller graph sizes, but so far I don't see
anything helpful there apart from an exception about not
setting HADOOP_HOME.
Here are the spark properties; notice the nice and small
extraClassPath! :)
Name
|
Value
|
gremlin.graph
|
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
|
gremlin.hadoop.deriveMemory
|
false
|
gremlin.hadoop.graphReader
|
org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
|
gremlin.hadoop.graphWriter
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.graphWriter.hasEdges
|
false
|
gremlin.hadoop.inputLocation
|
none
|
gremlin.hadoop.jarsInDistributedCache
|
true
|
gremlin.hadoop.memoryOutputFormat
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.outputLocation
|
output
|
janusgraphmr.ioformat.conf.storage.backend
|
hbase
|
janusgraphmr.ioformat.conf.storage.hbase.region-count
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names
|
false
|
janusgraphmr.ioformat.conf.storage.hbase.table
|
TEST0.2.0
|
janusgraphmr.ioformat.conf.storage.hostname
|
10.22.5.65:2181
|
log4j.appender.STDOUT
|
org.apache.log4j.ConsoleAppender
|
log4j.logger.deng
|
WARNING
|
log4j.rootLogger
|
STDOUT
|
org.slf4j.simpleLogger.defaultLogLevel
|
warn
|
spark.akka.frameSize
|
1024
|
spark.app.id
|
application_1502118729859_0041
|
spark.app.name
|
Apache TinkerPop's
Spark-Gremlin
|
spark.authenticate
|
false
|
spark.cores.max
|
64
|
spark.driver.appUIAddress
|
http://10.22.5.61:4040
|
spark.driver.extraJavaOptons
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
|
spark.driver.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.driver.host
|
10.22.5.61
|
spark.driver.port
|
38529
|
spark.dynamicAllocation.enabled
|
true
|
spark.dynamicAllocation.executorIdleTimeout
|
60
|
spark.dynamicAllocation.minExecutors
|
0
|
spark.dynamicAllocation.schedulerBacklogTimeout
|
1
|
spark.eventLog.dir
|
hdfs://host001:8020/user/spark/applicationHistory
|
spark.eventLog.enabled
|
true
|
spark.executor.extraClassPath
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf:
|
spark.executor.extraJavaOptions
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
|
spark.executor.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.executor.heartbeatInterval
|
100000
|
spark.executor.id
|
driver
|
spark.executor.memory
|
10240m
|
spark.externalBlockStore.folderName
|
spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db
|
spark.kyroserializer.buffer.max
|
1600m
|
spark.master
|
yarn-client
|
spark.network.timeout
|
90000
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS
|
host005
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES
|
http://host005:8088/proxy/application_1502118729859_0041
|
spark.scheduler.mode
|
FIFO
|
spark.serializer
|
org.apache.spark.serializer.KryoSerializer
|
spark.shuffle.service.enabled
|
true
|
spark.shuffle.service.port
|
7337
|
spark.ui.filters
|
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
|
spark.ui.killEnabled
|
true
|
spark.yarn.am.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.yarn.appMasterEnv.CLASSPATH
|
/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
|
spark.yarn.config.gatewayPath
|
/opt/cloudera/parcels
|
spark.yarn.config.replacementPath
|
{{HADOOP_COMMON_HOME}}/../../..
|
spark.yarn.dist.archives
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
|
spark.yarn.dist.files
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
|
spark.yarn.dist.jars
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
|
spark.yarn.historyServer.address
|
http://host001:18088
|
zookeeper.znode.parent
|
/hbase
|
-Joe
On 8/9/2017 3:33 PM, HadoopMarc wrote:
Hi Gari and Joe,
Glad to see you testing the recipes for MapR and
Cloudera respectively! I am sure that you realized by
now that getting this to work is like walking through a
minefield. If you deviate from the known path, the odds
for getting through are dim, and no one wants to be in
your vicinity. So, if you see a need to deviate (which
there may be for the hadoop distributions you use), you
will need your mine sweeper, that is, put the logging
level to DEBUG for relevant java packages.
This is where you deviated:
- for Gari: you put all kinds of MapR lib folders on
the applications master's classpath (other classpath
configs are not visible from your post)
- for Joe: you put all kinds of Cloudera lib folders
on the executors classpath (worst of all the
spark-assembly.jar)
Probably, you experience all kinds of mismatches in
netty libraries which slows down or even kills all
comms between the yarn containers. The philosophy of
the recipes really is to only add the minimum number
of conf folders and jars to the Tinkerpop/Janusgraph
distribution and see from there if any libraries are
missing.
At my side, it has become apparent that I should at
least add to the recipes:
- proof of work for a medium-sized graph (say 10M
vertices and edges)
- configs for the number of executors present in the
OLAP job (instead of relying on spark default number
of 2)
So, still some work to do!
Cheers, Marc
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Hi Joe, To shed some more light on the running figures you presented, I ran some tests on my own cluster: 1. I loaded the default janusgraph-hbase table with the following simple script from the console: graph=JanusGraphFactory.open("conf/janusgraph-hbase.properties") g = graph.traversal() m = 1200L n = 10000L (0L..<m).each{ (0L..<n).each{ v1 = g.addV().id().next() v2 = g.addV().id().next() g.V(v1).addE('link1').to(g.V(v2)).next() g.V(v1).addE('link2').to(g.V(v2)).next() } g.tx().commit() }
This scipt runs about 20(?) minutes and results in 24M vertices and edges committed to the graph. 2. I did an OLTP g.V().count() on this graph from the console: 11 minutes first time, 10 minutes second time 3. I ran OLAP jobs on this graph using janusgraph-hhbase in two ways: a) with g = graph.traversal().withComputer(SparkGraphComputer) b) with g = graph.traversal().withComputer(new Computer().graphComputer(SparkGraphComputer).workers(10)) the properties file was as in the recipe, with the exception of: spark.executor.memory=4096m # smaller values might work, but the 512m from the recipe is definitely too small spark.executor.instances=4 #spark.executor.cores not set, so default value 1 This resulted in the following running times: a) stage 0,1,2 => 12min, 12min, 3s => 24min total b) stage 0,1,2 => 18min, 1min, 86ms => 19 min total Discussion: - HBase is not an easy source for OLAP: HBase wants large regions for efficiency (configurable, but typically 2-20GB), while mapreduce inputformats (like janusgraph's HBaseInputFormat) take regions as inputsplits by default. This means that only a few executors will read from HBase unless the HBaseInputFormat is extended to split a region's keyspace into multiple inputsplits. This mismatch between the numbers of regions and spark executors is a potential JanusGraph issue. Examples exist to improve on this, e.g.
org.apache.hadoop.hbase.mapreduce.RowCounter
- For spark stages after stage 0 (reading from HBase), increasing the number of spark tasks with the "workers()" setting helps optimizing the parallelization. This means that for larger traversals than just a vertex count, the parallelization with spark will really pay off.
- I did not try to repeat your settings with a large number of cores. Various sources discourage the use of spark.executor.cores values larger than 5, e.g. https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/, https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory
Hopefully, these tests provide you and other readers with some additional perspectives on the configuration of janusgraph-hbase. Cheers, Marc Op donderdag 10 augustus 2017 15:40:21 UTC+2 schreef Joseph Obernberger:
toggle quoted messageShow quoted text
Thank you Marc.
I did not set spark.executor.instances, but I do have
spark.cores.max set to 64 and within YARN, it is configured to
allow has much RAM/cores for our 5 server cluster. When I run a
job on a table that has 61 regions, I see that 43 tasks are
started and running on all 5 nodes in the Spark UI (and running
top on each of the servers). If I lower the amount of RAM (heap)
that each tasks has (currently set to 10G), they fail with
OutOfMemory exceptions. It still hits one HBase node very hard
and cycles through them. While that may be a reason for a
performance issue, it doesn't explain the massive number of calls
that HBase receives for a count job, and why using
SparkGraphComputer takes so much more time.
Running with your command below appears to not alter the
behavior. I did run a job last night with DEBUG turned on, but it
produced too much logging filling up the log directory on 3 of the
5 nodes before stopping.
Thanks again Marc!
-Joe
On 8/10/2017 7:33 AM, HadoopMarc wrote:
Hi Joe,
Another thing to try (only tested on Tinkerpop, not on
JanusGraph): create the traversalsource as follows:
g = graph.traversal(). withComputer(new
Computer().graphComputer( SparkGraphComputer).workers( 100))
With HadoopGraph this helps hdfs files with very large or no
partitions to be split across tasks; I did not check the effect
yet for HBaseInputFormat in JanusGraph. And did you add
spark.executor.instances=10 (or some suitable number) to your
config? And did you check in the RM ui or Spark history server
whether these executors were really allocated and started?
More later,
Marc
Op donderdag 10 augustus 2017 00:13:09 UTC+2 schreef Joseph
Obernberger:
Marc - thank you. I've updated the classpath and removed
nearly all of the CDH jars; had to keep chimera and some
of the HBase libs in there. Apart from those and all the
jars in lib.zip, it is working as it did before. The
reason I turned DEBUG off was because it was producing
100+GBytes of logs. Nearly all of which are things like:
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore
- Generated HBase Filter ColumnRangeFilter [\x10\xC0,
\x10\xC1)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx
- Guava vertex cache size: requested=20000 effective=20000
(min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created vertex cache with max size 20000
18:04:29 DEBUG org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore
- Generated HBase Filter ColumnRangeFilter [\x10\xC2,
\x10\xC3)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx
- Guava vertex cache size: requested=20000 effective=20000
(min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache
- Created vertex cache with max size 20000
Do those mean anything to you? I've turned it back on for
running with smaller graph sizes, but so far I don't see
anything helpful there apart from an exception about not
setting HADOOP_HOME.
Here are the spark properties; notice the nice and small
extraClassPath! :)
Name
|
Value
|
gremlin.graph
|
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
|
gremlin.hadoop.deriveMemory
|
false
|
gremlin.hadoop.graphReader
|
org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
|
gremlin.hadoop.graphWriter
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.graphWriter.hasEdges
|
false
|
gremlin.hadoop.inputLocation
|
none
|
gremlin.hadoop.jarsInDistributedCache
|
true
|
gremlin.hadoop.memoryOutputFormat
|
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
|
gremlin.hadoop.outputLocation
|
output
|
janusgraphmr.ioformat.conf.storage.backend
|
hbase
|
janusgraphmr.ioformat.conf.storage.hbase.region-count
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server
|
5
|
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names
|
false
|
janusgraphmr.ioformat.conf.storage.hbase.table
|
TEST0.2.0
|
janusgraphmr.ioformat.conf.storage.hostname
|
10.22.5.65:2181
|
log4j.appender.STDOUT
|
org.apache.log4j.ConsoleAppender
|
log4j.logger.deng
|
WARNING
|
log4j.rootLogger
|
STDOUT
|
org.slf4j.simpleLogger.defaultLogLevel
|
warn
|
spark.akka.frameSize
|
1024
|
spark.app.id
|
application_1502118729859_0041
|
spark.app.name
|
Apache TinkerPop's
Spark-Gremlin
|
spark.authenticate
|
false
|
spark.cores.max
|
64
|
spark.driver.appUIAddress
|
http://10.22.5.61:4040
|
spark.driver.extraJavaOptons
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
|
spark.driver.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.driver.host
|
10.22.5.61
|
spark.driver.port
|
38529
|
spark.dynamicAllocation.enabled
|
true
|
spark.dynamicAllocation.executorIdleTimeout
|
60
|
spark.dynamicAllocation.minExecutors
|
0
|
spark.dynamicAllocation.schedulerBacklogTimeout
|
1
|
spark.eventLog.dir
|
hdfs://host001:8020/user/spark/applicationHistory
|
spark.eventLog.enabled
|
true
|
spark.executor.extraClassPath
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf:
|
spark.executor.extraJavaOptions
|
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
|
spark.executor.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.executor.heartbeatInterval
|
100000
|
spark.executor.id
|
driver
|
spark.executor.memory
|
10240m
|
spark.externalBlockStore.folderName
|
spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db
|
spark.kyroserializer.buffer.max
|
1600m
|
spark.master
|
yarn-client
|
spark.network.timeout
|
90000
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS
|
host005
|
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES
|
http://host005:8088/proxy/application_1502118729859_0041
|
spark.scheduler.mode
|
FIFO
|
spark.serializer
|
org.apache.spark.serializer.KryoSerializer
|
spark.shuffle.service.enabled
|
true
|
spark.shuffle.service.port
|
7337
|
spark.ui.filters
|
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
|
spark.ui.killEnabled
|
true
|
spark.yarn.am.extraLibraryPath
|
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
|
spark.yarn.appMasterEnv.CLASSPATH
|
/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
|
spark.yarn.config.gatewayPath
|
/opt/cloudera/parcels
|
spark.yarn.config.replacementPath
|
{{HADOOP_COMMON_HOME}}/../../..
|
spark.yarn.dist.archives
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
|
spark.yarn.dist.files
|
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
|
spark.yarn.dist.jars
|
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
|
spark.yarn.historyServer.address
|
http://host001:18088
|
zookeeper.znode.parent
|
/hbase
|
-Joe
On 8/9/2017 3:33 PM, HadoopMarc wrote:
Hi Gari and Joe,
Glad to see you testing the recipes for MapR and
Cloudera respectively! I am sure that you realized by
now that getting this to work is like walking through a
minefield. If you deviate from the known path, the odds
for getting through are dim, and no one wants to be in
your vicinity. So, if you see a need to deviate (which
there may be for the hadoop distributions you use), you
will need your mine sweeper, that is, put the logging
level to DEBUG for relevant java packages.
This is where you deviated:
- for Gari: you put all kinds of MapR lib folders on
the applications master's classpath (other classpath
configs are not visible from your post)
- for Joe: you put all kinds of Cloudera lib folders
on the executors classpath (worst of all the
spark-assembly.jar)
Probably, you experience all kinds of mismatches in
netty libraries which slows down or even kills all
comms between the yarn containers. The philosophy of
the recipes really is to only add the minimum number
of conf folders and jars to the Tinkerpop/Janusgraph
distribution and see from there if any libraries are
missing.
At my side, it has become apparent that I should at
least add to the recipes:
- proof of work for a medium-sized graph (say 10M
vertices and edges)
- configs for the number of executors present in the
OLAP job (instead of relying on spark default number
of 2)
So, still some work to do!
Cheers, Marc
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|