Encounter error when run OLAP with Spark Yarn
"sa...@gmail.com" <santh...@...>
Hello everyone! I am trying to use spark yarn mode on janusgraph but I encounter several errors and I thought it may cause by inappropriate jar configuration. Error:”java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils” I can’t find any official jars configuration about spark yarn on Tinkerpop I searched a lot posts and followed this post: https://groups.google.com/g/janusgraph-users/c/mzyPFWYpJEI/m/0SVau-N0BQAJ The configuration is quite same, my cluster configuration is (on CDH 6.3.1) : Hadoop 3.0.0 Spark 2.11.-2.4.0 Hbase 2.1.0 And based on the jar configuration above, I add followed jars into janusgraph lib: spark-yarn_2.11-2.4.0-cdh6.3.1.jar scala-reflect-2.11.12.jar scala-reflect-2.11.8.jar hadoop-yarn-server-web-proxy-3.0.0-cdh6.3.1.jar guice-3.0.jar guice-4.0.jar guice-servlet-3.0.jar guice-servlet-4.0.jar The whole error stack trace is: java.lang.IllegalStateException: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197) at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:255) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:37) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:463) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:63) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:201) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:83) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:120) at org.codehaus.groovy.tools.shell.Shell$leftShift$1.call(Unknown Source) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:93) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:138) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:160) at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:97) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:168) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68) ... 74 more Caused by: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:943) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186) at org.apache.spark.SparkContext.<init>(SparkContext.scala:511) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-hbase.properties') ==>hadoopgraph[hbaseinputformat->nulloutputformat] gremlin> g = graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[hbaseinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V().count() 19:44:45 WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible 19:44:45 WARN org.apache.spark.SparkContext - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) 19:44:50 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 19:44:50 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 19:44:50 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 19:44:50 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 19:44:50 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4044. Attempting port 4045. 19:44:50 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4045. Attempting port 4046. 19:44:56 WARN org.apache.spark.deploy.yarn.Client - Same path resource file:/tmp/spark-gremlin-0-5-1.zip added multiple times to distributed cache. java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils Type ':help' or ':h' for help. Display stack trace? [yN]y java.lang.IllegalStateException: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197) at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:255) at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:37) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:463) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:63) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:201) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:83) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:120) at org.codehaus.groovy.tools.shell.Shell$leftShift$1.call(Unknown Source) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:93) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy) at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:138) at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:160) at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:97) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:168) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68) ... 71 more Caused by: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:943) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186) at org.apache.spark.SparkContext.<init>(SparkContext.scala:511) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-hbase.properties') ==>hadoopgraph[hbaseinputformat->nulloutputformat] gremlin> g = graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[hbaseinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V().count() 19:45:55 WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible 19:45:55 WARN org.apache.spark.SparkContext - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4044. Attempting port 4045. 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4045. Attempting port 4046. 19:45:55 WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4046. Attempting port 4047. 19:45:57 WARN org.apache.spark.deploy.yarn.Client - Same path resource file:/tmp/spark-gremlin-0-5-1.zip added multiple times to distributed cache. java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils Type ':help' or ':h' for help. Display stack trace? [yN]y java.lang.IllegalStateException: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197) at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:255) at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:37) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:463) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:63) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:201) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:83) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:120) at org.codehaus.groovy.tools.shell.Shell$leftShift$1.call(Unknown Source) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:93) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy) at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:138) at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:160) at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:97) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:168) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:234) at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68) ... 71 more Caused by: java.lang.IllegalAccessError: org/apache/spark/launcher/CommandBuilderUtils at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:943) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186) at org.apache.spark.SparkContext.<init>(SparkContext.scala:511) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) I thought it could be some cdh jars conflict, but I haven’t find any post or StackOverflow discussion about this error. Is there any official Tinkerpop spark yarn mentioned how to configuration the jars? Hoping for any reply! Any reply can help! |
|
HadoopMarc <bi...@...>
The stacktrace seems only related to the fact that the sparkcontext remained alive after a failed attempt at finding the right configs. So restart the gremlin console to get rid of this particular fail mode. The added jars have some suspicious items:
Best wishes, Marc Op woensdag 2 september 2020 om 15:15:07 UTC+2 schreef sa...@...:
|
|
"sa...@gmail.com" <santh...@...>
Hello Marc! Thanks for reply! I have seen you in a lot of post, appreciate your help! I have checked the all jars in my CDH libs and replace them when certain errors occur. So, followed post I mentioned above and errors, I have done these jar changes: spark-yarn_2.11-2.4.0-cdh6.3.1.jar scala-reflect-2.11.12.jar hadoop-yarn-server-web-proxy-3.0.0-cdh6.3.1.jar guice-4.0.jar guice-servlet-4.0.jar Changed jars as error mentioned: spark-core_2.11-2.4.0.jar ==> spark-core_2.11-2.4.0-cdh6.3.1.jar(org.apache.spark.network.util.NettyUtils.defaultNumThreads) spark-network-common_2.11-2.4.0.jar,spark-network-shuffle_2.11-2.4.0 ==> spark-network-common_2.11-2.4.0-cdh6.3.1.jar,spark-network-shuffle_2.11-2.4.0-cdh6.3.1.jar spark-launcher_2.11-2.4.0.jar ==> spark-launcher_2.11-2.4.0-cdh6.3.1.jar (org/apache/spark/launcher/CommandBuilderUtils) Then spark seems fine but occur another hadoop error about “hadoop common FastNumberFormat”: I also changed these jars into cdh version.: hadoop-yarn-api.jar hadoop-yarn-client.jar hadoop-yarn-common.jar hadoop-yarn-server-common.jar The newest error is “java.net.ConnectException: Call From XXXX-172-16-1-XXX/172.16.1.XXX to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: ConnectionRefused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused” In all configuration files, I just find gremlin-server.yaml contained host to local 0.0.0.0. I did’t set it in “read-hbase.properties” I forgot to post my read-hbase.properties yesterday: gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output gremlin.spark.persistContext=true # # JanusGraph HBase InputFormat configuration # janusgraphmr.ioformat.conf.storage.backend=hbase janusgraphmr.ioformat.conf.storage.hostname==172.16.1.XX,172.16. XX,172.16.1. XX,172.16.1. XX,172.16.1. XX,172.16.1. XX,172.16.1. XX,172.16.1. XX janusgraphmr.ioformat.conf.storage.hbase.table=janusgraph # # SparkGraphComputer Configuration # spark.master=yarn spark.submit.deployMode=client spark.yarn.dist.jars=/tmp/spark-gremlin-0-5-1.zip spark.yarn.archive=/tmp/spark-gremlin-0-5-1.zip spark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[hadoop_conf_dir] spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_conf_dir] spark.driver.extraLibraryPath=/home/janusgraph-full-0.5.1/cloudera_native/ spark.io.compression.codec=snappy spark.executor.memory=4g spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator I wanna using OLAP because I found if I use the origin OLTP to traverse the whole graph like g.E().count() in a 660k edges graph, it always occurs the “GC overhead problem”. Is it normal that use OLTP traverse in Janusgraph (660k edges and 440k vertexes graph is not a big graph,isn’t it?) would cause GC problem or java heap problem? The complete stack trace: java.net.ConnectException: Call From XXX-172-16-1-XXX/172.16.1.XXX to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: ConnectionRefused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor73.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1480) at org.apache.hadoop.ipc.Client.call(Client.java:1413) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy27.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:266) at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy28.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:256) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:264) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186) at org.apache.spark.SparkContext.<init>(SparkContext.scala:511) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: ConnectionRefused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529) at org.apache.hadoop.ipc.Client.call(Client.java:1452) ... 25 more 15:24:45 ERROR org.apache.spark.SparkContext - Error initializing SparkContext. java.net.ConnectException: Call From kXXX-172-16-1-XXX/172.16.1.XXX to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: ConnectionRefused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor73.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1480) at org.apache.hadoop.ipc.Client.call(Client.java:1413) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy27.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:266) at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy28.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:256) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:264) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186) at org.apache.spark.SparkContext.<init>(SparkContext.scala:511) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: ConnectionRefused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529) at org.apache.hadoop.ipc.Client.call(Client.java:1452) ... 25 more 在2020年9月2日星期三 UTC+8 下午11:09:19<HadoopMarc> 写道:
|
|
HadoopMarc <bi...@...>
OK, some steps made! The stacktrace says that the yarn client cannot connect to the yarn Resource Manager. So, check how you replaced the [hadoop_conf_dir] placeholder, which directory should contain the xml files with the connection details for your cluster. There is not a particular reason why OLTP g.E().count() could not work. Apparently, the new vertices come in faster than they can be GC'ed after counting. You can try to increase the JVM memory with an export JAVA_OPTIONS="-Xmx2048m" or play with the JVM GC settings. I am afraid the OLAP with janusgraph-hbase is not going to help much because probably all your data is stored in a single HBase region and the HBaseInputFormat puts an entire region in a single hadoop partition (so there is nothing to parallelize for spark). Best wishes, Marc Op donderdag 3 september 2020 om 09:36:17 UTC+2 schreef sa...@...:
|
|
"sa...@gmail.com" <santh...@...>
Hello Marc! Thank you again! Sorry for my silly mistake......I forgot add hadoop conf dir into /etc/profile(I used to manually export this in console But I still encounter a lot of ERROR caused by jars conflict…… I replace all spark-* hadoop-* common-* hadoop-yarn-* jars in jg/libs with my CDH jars Now I could hand in spark job but still …… error The full stack trace is followed: gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-hbase.properties') ==>hadoopgraph[hbaseinputformat->nulloutputformat] gremlin> g.V(256) No such property: g for class: groovysh_evaluate Type ':help' or ':h' for help. Display stack trace? [yN][A gremlin> g = graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[hbaseinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V(256) 15:32:01 WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible 15:32:09 WARN org.apache.spark.deploy.yarn.Client - Same path resource file:/tmp/spark-gremlin-0-5-1.zip added multiple times to distributed cache. 15:32:18 ERROR org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details. 15:32:18 ERROR org.apache.spark.SparkContext - Error initializing SparkContext. org.apache.spark.SparkException: Application application_1598456265077_0153 failed 2 times due to AM Container for appattempt_1598456265077_0153_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2020-09-05 15:32:17.636]Exception from container-launch. Container id: container_1598456265077_0153_02_000001 Exit code: 1 [2020-09-05 15:32:17.641]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher [2020-09-05 15:32:17.642]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher For more detailed output, check the application tracking page: http://cdh02:8088/cluster/app/application_1598456265077_0153 Then click on links to logs of each attempt. . Failing the application. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:186) at org.apache.spark.SparkContext.<init>(SparkContext.scala:511) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2549) at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52) at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 15:32:18 WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint - Attempted to request executors before the AM has registered! 15:32:18 WARN org.apache.spark.metrics.MetricsSystem - Stopping a MetricsSystem that is not running org.apache.spark.SparkException: Application application_1598456265077_0153 failed 2 times due to AM Container for appattempt_1598456265077_0153_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2020-09-05 15:32:17.636]Exception from container-launch. Container id: container_1598456265077_0153_02_000001 Exit code: 1 [2020-09-05 15:32:17.641]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher [2020-09-05 15:32:17.642]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher The spark error log shows:Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher I checked all the spark jars and hadoop-yarn jars and any jars contain yarn in my CDH lib, but it still shows the same error. I googled this error and found some blogs records to modify “yarn-site.xml” to include jars about yarn. The running log I posted above “15:32:09 WARN org.apache.spark.deploy.yarn.Client - Same path resource file:/tmp/spark-gremlin-0-5-1.zip added multiple times to distributed cache.” shows the spark job I hand in used jars in jg/lib(I zipped it and put it in /tmp path). So it should have found spark yarn jar in it but not? I also tried upload jars to hdfs://some_path,and add spark.yarn.jars=hdfs://some_path in read-hbase.properties, but still got same ERROR Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher I am really confusing about the jars now.…… Following errors and jar replacement is a simple note for those who could encounter the same errors java.lang.NoClassDefFoundError: org/apache/hadoop/util/FastNumberFormat ==> replace all hadoop-* jars and common-* jars (except hadoop-gremlin) (if you just replace hadoop-* jars, you gremlin console meet starting failure. Errors will be like this “java.lang.NoClassDefFoundError: org/apache/commons/configuration2/Configuration”) java.lang.IllegalStateException: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext.setAMContainerResourceRequests(Ljava/util/List;)V ==> replace all hadoop-yarn jars 在2020年9月3日星期四 UTC+8 下午7:30:17<HadoopMarc> 写道:
|
|
HadoopMarc <bi...@...>
Somehow your spark.yarn.appMasterEnv.CLASSPATH does not contain the spark-yarn jar. Maybe, the Environment tab in the spark web UI provides some insight. Try to run without the config line "spark.yarn.dist.jars=/tmp/spark-gremlin-0-5-1.zip" because it generates a WARNING and possibly interferes with the yarn.archive line that should result in the __spark_libs__ entry. Marc Op maandag 7 september 2020 om 08:35:42 UTC+2 schreef sa...@...:
|
|