Hi,
I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store. I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).
Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -
java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)
I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.
First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?
let me know if any additional info is required to dig deeper.
~mbaxi
|
|
Jason Plurad <plu...@...>
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?
toggle quoted message
Show quoted text
On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote: Hi,
I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store. I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).
Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -
java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)
I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.
First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?
let me know if any additional info is required to dig deeper.
~mbaxi
|
|
You are right Jason that ElasticSearchIndex class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.
I will add few more details about the customizations done in our environment if that helps - To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
- In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
- Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
- Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
- Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark + ES together which needs to be addressed correctly.
I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.
~mbaxi
toggle quoted message
Show quoted text
On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote: The class org.janusgraph.diskstorage.es. ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory? On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote: Hi,
I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store. I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).
Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -
java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)
I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.
First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?
let me know if any additional info is required to dig deeper.
~mbaxi
|
|
Hi m You might also try the approach I explained in (also discussed in another thread on this forum): http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.htmlHere I show that you do not need the hadoop/hbase/spark jars of your specific distribution. If you get rid of the MapR spark-assembly you do not need the guava shading. The guava shading might be the cause of the ES problems somehow. HTH, Marc Op woensdag 23 augustus 2017 18:56:32 UTC+2 schreef mystic m:
toggle quoted message
Show quoted text
You are right Jason that ElasticSearchIndex class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.
I will add few more details about the customizations done in our environment if that helps - To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
- In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
- Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
- Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
- Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark + ES together which needs to be addressed correctly.
I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.
~mbaxi
On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote: The class org.janusgraph.diskstorage.es. ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory? On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote: Hi,
I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store. I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).
Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -
java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)
I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.
First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?
let me know if any additional info is required to dig deeper.
~mbaxi
|
|
Thanks Marc, your blog post is helpful.
I started the set-up from scratch but I did replace/added distribution specific jars for hadoop and hbase to be able to interact with maprfs and mapr-db.
Also I was able to get rid of MapR spark-assembly from gremlin CLASSPATH by placing it in hdfs and adding spark-yarn jar to gremlin CLASSPATH. This lets me submit the spark job on yarn. I added the janusgraph-hbase jar and spark-gremlin jars as you have specified in the blog, when spark job starts the jars are copied appropriately to the staging area in hdfs but still I get below listed exception, in last setup I had copied hadoop-gremlin-libs in SPARK_LIB directory across the cluster to resolve the issue, I am not sure why they are not picked from hdfs directory, I will debug this more tomorrow and post back.
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
toggle quoted message
Show quoted text
On Thursday, August 24, 2017 at 12:30:58 AM UTC+5:30, HadoopMarc wrote: Hi m You might also try the approach I explained in (also discussed in another thread on this forum): http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.htmlHere I show that you do not need the hadoop/hbase/spark jars of your specific distribution. If you get rid of the MapR spark-assembly you do not need the guava shading. The guava shading might be the cause of the ES problems somehow. HTH, Marc Op woensdag 23 augustus 2017 18:56:32 UTC+2 schreef mystic m: You are right Jason that ElasticSearchIndex class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.
I will add few more details about the customizations done in our environment if that helps - To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
- In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
- Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
- Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
- Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark + ES together which needs to be addressed correctly.
I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.
~mbaxi
On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote: The class org.janusgraph.diskstorage.es. ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory? On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote: Hi,
I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store. I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).
Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -
java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)
I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.
First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?
let me know if any additional info is required to dig deeper.
~mbaxi
|
|