Can BulkLoaderVertexProgram also add mixed indexes


mystic m <mita...@...>
 

Thanks Marc, your blog post is helpful.

I started the set-up from scratch but I did replace/added distribution specific jars for hadoop and hbase to be able to interact with maprfs and mapr-db.

Also I was able to get rid of MapR spark-assembly from gremlin CLASSPATH by placing it in hdfs and adding spark-yarn jar to gremlin CLASSPATH. This lets me submit the spark job on yarn. I added the janusgraph-hbase jar and spark-gremlin jars as you have specified in the blog, when spark job starts the jars are copied appropriately to the staging area in hdfs but still I get below listed exception, in last setup I had copied hadoop-gremlin-libs in SPARK_LIB directory across the cluster  to resolve the issue, I am not sure why they are not picked from hdfs directory, I will debug this more tomorrow and post back.

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1

        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)



On Thursday, August 24, 2017 at 12:30:58 AM UTC+5:30, HadoopMarc wrote:
Hi m

You might also try the approach I explained in (also discussed in another thread on this forum):

http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Here I show that you do not need the hadoop/hbase/spark jars of your specific distribution. If you get rid of the MapR spark-assembly you do not need the guava shading. The guava shading might be the cause of the ES problems somehow.

HTH,     Marc

Op woensdag 23 augustus 2017 18:56:32 UTC+2 schreef mystic m:
You are right Jason that ElasticSearchIndex  class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.

I will add few more details about the customizations done in our environment if that helps
  1. To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
  2. In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
  3. Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
  4. Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
  5. Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark  + ES together which needs to be addressed correctly.

I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.

~mbaxi



On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote:
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?

On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi


HadoopMarc <bi...@...>
 

Hi m

You might also try the approach I explained in (also discussed in another thread on this forum):

http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Here I show that you do not need the hadoop/hbase/spark jars of your specific distribution. If you get rid of the MapR spark-assembly you do not need the guava shading. The guava shading might be the cause of the ES problems somehow.

HTH,     Marc

Op woensdag 23 augustus 2017 18:56:32 UTC+2 schreef mystic m:

You are right Jason that ElasticSearchIndex  class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.

I will add few more details about the customizations done in our environment if that helps
  1. To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
  2. In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
  3. Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
  4. Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
  5. Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark  + ES together which needs to be addressed correctly.

I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.

~mbaxi



On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote:
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?

On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi


mystic m <mita...@...>
 

You are right Jason that ElasticSearchIndex  class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.

I will add few more details about the customizations done in our environment if that helps
  1. To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
  2. In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
  3. Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
  4. Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
  5. Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark  + ES together which needs to be addressed correctly.

I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.

~mbaxi



On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote:
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?

On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi


Jason Plurad <plu...@...>
 

The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?


On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi


mystic m <mita...@...>
 

Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi