Date   

Re: Proper way to define metaproperties in schema

Jason Plurad <plu...@...>
 

Would you be able to test your scenario against JanusGraph master branch? It's running TP 3.2.6.


On Tuesday, August 29, 2017 at 10:30:00 AM UTC-4, David Brown wrote:
A bit of a follow up on this. I have determined that the source of my problem relates to selecting an individual vertex property from a list cardinality property. I use this traversal: g.V(v).properties('key').hasValue(val).next(). If `val` is anything other than a string, this query returns nothing. It works fine with string type properties. For the record, I am using gremlin-python based code to remote the db, and I think the current Janus only tests against TP 3.2.3 (which is pre gremlin-python). My code runs as expected against TP 3.2.4 + with TinkerGraph.

On Monday, August 28, 2017 at 1:44:18 PM UTC-4, David Brown wrote:
Hello JanusGraph users,

I have been experimenting with Janus, and using the automatic schema generation, metaproperties work as expected. However, when I set `schema.default=none` in the conf and define my own schema, metaproperties seem to quit working--metaproperty data is no longer returned in the Gremlin Server response. How should metaproperties be defined in the schema? I can't seem to find this information in the documentation. I can provide example schema definitions if necessary.

Thanks,

Dave


Re: Proper way to define metaproperties in schema

David Brown <dave...@...>
 

A bit of a follow up on this. I have determined that the source of my problem relates to selecting an individual vertex property from a list cardinality property. I use this traversal: g.V(v).properties('key').hasValue(val).next(). If `val` is anything other than a string, this query returns nothing. It works fine with string type properties. For the record, I am using gremlin-python based code to remote the db, and I think the current Janus only tests against TP 3.2.3 (which is pre gremlin-python). My code runs as expected against TP 3.2.4 + with TinkerGraph.


On Monday, August 28, 2017 at 1:44:18 PM UTC-4, David Brown wrote:
Hello JanusGraph users,

I have been experimenting with Janus, and using the automatic schema generation, metaproperties work as expected. However, when I set `schema.default=none` in the conf and define my own schema, metaproperties seem to quit working--metaproperty data is no longer returned in the Gremlin Server response. How should metaproperties be defined in the schema? I can't seem to find this information in the documentation. I can provide example schema definitions if necessary.

Thanks,

Dave


Re: Proper way to define metaproperties in schema

David Brown <dave...@...>
 



On Monday, August 28, 2017 at 4:01:40 PM UTC-4, David Brown wrote:
Thanks for the quick replies. Thanks to the examples, I've determined this is a bug in Goblin.

On Monday, August 28, 2017 at 2:07:38 PM UTC-4, Jason Plurad wrote:
I opened up an issue to add docs on meta-properties and multi-properties.

This worked in the Gremlin Console:

gremlin> graph = JanusGraphFactory.build().set('storage.backend', 'inmemory').set('schema.default', 'none').open()
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7a360554
gremlin
> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin
> foo = mgmt.makePropertyKey('foo').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>foo
gremlin
> mgmt.commit()
==>null
gremlin
> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin
> v = g.addV('name', 'dave').next()
==>v[4232]
gremlin
> g.V(v).properties('name').property('foo', 'bar').iterate()
gremlin
> g.V(v).valueMap(true)
==>[label:vertex,id:4232,name:[dave]]
gremlin
> g.V(v).properties('name').valueMap(true)
==>[value:dave,id:sx-39k-sl,foo:bar,key:name]
gremlin
> g.V(v).properties('name').property('bla', 'dat').iterate()
Property Key with given name does not exist: bla

Were you trying something different?


On Monday, August 28, 2017 at 1:44:18 PM UTC-4, David Brown wrote:
Hello JanusGraph users,

I have been experimenting with Janus, and using the automatic schema generation, metaproperties work as expected. However, when I set `schema.default=none` in the conf and define my own schema, metaproperties seem to quit working--metaproperty data is no longer returned in the Gremlin Server response. How should metaproperties be defined in the schema? I can't seem to find this information in the documentation. I can provide example schema definitions if necessary.

Thanks,

Dave


New committer: David Clement

Jason Plurad <plu...@...>
 

On behalf of the JanusGraph Technical Steering Committee (TSC), I'm pleased to welcome a new committer on the project!

David Clement has submitted several good pull requests which enhanced the functionality for the indexing backends, both ES and Solr. He has been thorough and quite responsive to the feedback offered in the reviews.


extras that provide business value

an...@...
 

Janus is such a compelling database to switch to. Right now we use Neo4j. There are a couple of features that neo4j has that are very useful and I was wondering if they are on the roadmap for Janus?

1. bulk uploading via csv files. the neo4j import tool is powerful; its like a language dedicated for bulk loading existing data. there are many cases where this is a must have.
2. data visualization via browser. pretty data visualizations make everyone happy



Re: Proper way to define metaproperties in schema

David Brown <dave...@...>
 

Thanks for the quick replies. Thanks to the examples, I've determined this is a bug in Goblin.


On Monday, August 28, 2017 at 2:07:38 PM UTC-4, Jason Plurad wrote:
I opened up an issue to add docs on meta-properties and multi-properties.

This worked in the Gremlin Console:

gremlin> graph = JanusGraphFactory.build().set('storage.backend', 'inmemory').set('schema.default', 'none').open()
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7a360554
gremlin
> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin
> foo = mgmt.makePropertyKey('foo').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>foo
gremlin
> mgmt.commit()
==>null
gremlin
> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin
> v = g.addV('name', 'dave').next()
==>v[4232]
gremlin
> g.V(v).properties('name').property('foo', 'bar').iterate()
gremlin
> g.V(v).valueMap(true)
==>[label:vertex,id:4232,name:[dave]]
gremlin
> g.V(v).properties('name').valueMap(true)
==>[value:dave,id:sx-39k-sl,foo:bar,key:name]
gremlin
> g.V(v).properties('name').property('bla', 'dat').iterate()
Property Key with given name does not exist: bla

Were you trying something different?


On Monday, August 28, 2017 at 1:44:18 PM UTC-4, David Brown wrote:
Hello JanusGraph users,

I have been experimenting with Janus, and using the automatic schema generation, metaproperties work as expected. However, when I set `schema.default=none` in the conf and define my own schema, metaproperties seem to quit working--metaproperty data is no longer returned in the Gremlin Server response. How should metaproperties be defined in the schema? I can't seem to find this information in the documentation. I can provide example schema definitions if necessary.

Thanks,

Dave


Re: Proper way to define metaproperties in schema

Jason Plurad <plu...@...>
 

I opened up an issue to add docs on meta-properties and multi-properties.

This worked in the Gremlin Console:

gremlin> graph = JanusGraphFactory.build().set('storage.backend', 'inmemory').set('schema.default', 'none').open()
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin
> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7a360554
gremlin
> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin
> foo = mgmt.makePropertyKey('foo').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>foo
gremlin
> mgmt.commit()
==>null
gremlin
> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin
> v = g.addV('name', 'dave').next()
==>v[4232]
gremlin
> g.V(v).properties('name').property('foo', 'bar').iterate()
gremlin
> g.V(v).valueMap(true)
==>[label:vertex,id:4232,name:[dave]]
gremlin
> g.V(v).properties('name').valueMap(true)
==>[value:dave,id:sx-39k-sl,foo:bar,key:name]
gremlin
> g.V(v).properties('name').property('bla', 'dat').iterate()
Property Key with given name does not exist: bla

Were you trying something different?


On Monday, August 28, 2017 at 1:44:18 PM UTC-4, David Brown wrote:
Hello JanusGraph users,

I have been experimenting with Janus, and using the automatic schema generation, metaproperties work as expected. However, when I set `schema.default=none` in the conf and define my own schema, metaproperties seem to quit working--metaproperty data is no longer returned in the Gremlin Server response. How should metaproperties be defined in the schema? I can't seem to find this information in the documentation. I can provide example schema definitions if necessary.

Thanks,

Dave


Proper way to define metaproperties in schema

David Brown <dave...@...>
 

Hello JanusGraph users,

I have been experimenting with Janus, and using the automatic schema generation, metaproperties work as expected. However, when I set `schema.default=none` in the conf and define my own schema, metaproperties seem to quit working--metaproperty data is no longer returned in the Gremlin Server response. How should metaproperties be defined in the schema? I can't seem to find this information in the documentation. I can provide example schema definitions if necessary.

Thanks,

Dave


Re: How can we bulk load the edges while we have the vertexes in our JanusGraph DB?

stan...@...
 

兄弟?加我qq 175501069我们群里也在讨论janusgraph,都在最近第一次用的

在 2017年8月9日星期三 UTC+8下午2:26:33,hu junjie写道:

Assume, we have the vertexes in DB. and we have the edge information in GraphSON/XML/TXT? how can we import the edges into JanusGraph?


Re: JanusGraph seems to force embedded ElasticSearch

Jason Plurad <plu...@...>
 

This commonly happens if you're connecting to a graph instance that was previously created. The initial configuration is stored within the graph itself.

For BerkeleyJE, try pointing storage.directory to a new location or deleting the existing db/berkeley directory.
For Cassandra, try using a different storage.cassandra.keyspace or dropping the existing keyspace.
For HBase, try using a different storage.hbase.table or dropping the existing table.

-- Jason


On Friday, August 25, 2017 at 12:28:27 PM UTC-4, Mike Thomsen wrote:
This is my configuration:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=db/berkeley
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1
index.search.elasticsearch.client-only=true

i loaded it with Gremlin like this:

graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')

But it insists on creating an embedded ElasticSearch node at db/es despite a remote connection being specified.

What am I doing wrong?

Thanks,

Mike


JanusGraph seems to force embedded ElasticSearch

mikert...@...
 

This is my configuration:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=db/berkeley
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1
index.search.elasticsearch.client-only=true

i loaded it with Gremlin like this:

graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')

But it insists on creating an embedded ElasticSearch node at db/es despite a remote connection being specified.

What am I doing wrong?

Thanks,

Mike


Re: hey guys ,how to query a person relational depth

李平 <lipin...@...>
 

ok  thanks   ,another question , how to skip a super vertex ,this vertex has lots of edge ,for example .the phone is 911 or this phone is custom service,  so a lots of people has relation with this vertex , when I query a person two layer relation depth, it will query a lots of person
custom service;


在 2017年8月25日星期五 UTC+8上午2:32:17,Daniel Kuppitz写道:

Any of the following 2 queries should do the trick:

gremlin> g.V().has('name','A').
           repeat(out('hasPhone').in('hasPhone').simplePath()).emit().
           project('name','depth').
             by('name').
             by(path().count(local))
==>[name:C,depth:3]
==>[name:G,depth:3]
==>[name:E,depth:5]

gremlin> g.V().has('name','A').
           repeat(out('hasPhone').in('hasPhone').simplePath().as('x')).emit().
           project('name','depth').
             by('name').
             by(select(all, 'x').count(local))
==>[name:C,depth:1]
==>[name:G,depth:1]
==>[name:E,depth:2]

Pretty much depends on how you define "relation depth".

Cheers,
Daniel




On Wed, Aug 23, 2017 at 7:26 PM, 李平 <li...@...> wrote:
gremlin> g.addV().property('name', 'A').as('a'). addV().property('phone', '110').as('b'). addV().property('name', 'C').as('c'). addV().property('phone', '111').as('d'). addV().property('name', 'E').as('e'). addV().property('phone', '112').as('f'). addV().property('name', 'G').as('g'). addE('hasPhone').from('a').to('b'). addE('hasPhone').from('c').to('d'). addE('hasPhone').from('c').to('b'). addE('hasPhone').from('e').to('d').
addE('hasPhone').from('e').to('f'). addE('hasPhone').from('g').to('b').iterate()



if I want to know vertex A's relation depth how to write the gremlin commond,   

I  write like this 

g.V().has('userId','1').repeat(__.as("a").out().in().where(neq("a"))).emit().path().count(local).max()



but it seem endless loop,
在 2017年8月23日星期三 UTC+8下午9:16:21,Jason Plurad写道:
There's a recipe for this http://tinkerpop.apache.org/docs/current/recipes/#_maximum_depth

On Wednesday, August 23, 2017 at 3:52:14 AM UTC-4, 李平 wrote:
I want to know ,one person in the janusGraph ,his relational depth,use gremlin

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: hey guys ,how to query a person relational depth

Daniel Kuppitz <me@...>
 

Any of the following 2 queries should do the trick:

gremlin> g.V().has('name','A').
           repeat(out('hasPhone').in('hasPhone').simplePath()).emit().
           project('name','depth').
             by('name').
             by(path().count(local))
==>[name:C,depth:3]
==>[name:G,depth:3]
==>[name:E,depth:5]

gremlin> g.V().has('name','A').
           repeat(out('hasPhone').in('hasPhone').simplePath().as('x')).emit().
           project('name','depth').
             by('name').
             by(select(all, 'x').count(local))
==>[name:C,depth:1]
==>[name:G,depth:1]
==>[name:E,depth:2]

Pretty much depends on how you define "relation depth".

Cheers,
Daniel




On Wed, Aug 23, 2017 at 7:26 PM, 李平 <lipin...@...> wrote:
gremlin> g.addV().property('name', 'A').as('a'). addV().property('phone', '110').as('b'). addV().property('name', 'C').as('c'). addV().property('phone', '111').as('d'). addV().property('name', 'E').as('e'). addV().property('phone', '112').as('f'). addV().property('name', 'G').as('g'). addE('hasPhone').from('a').to('b'). addE('hasPhone').from('c').to('d'). addE('hasPhone').from('c').to('b'). addE('hasPhone').from('e').to('d').
addE('hasPhone').from('e').to('f'). addE('hasPhone').from('g').to('b').iterate()



if I want to know vertex A's relation depth how to write the gremlin commond,   

I  write like this 

g.V().has('userId','1').repeat(__.as("a").out().in().where(neq("a"))).emit().path().count(local).max()



but it seem endless loop,
在 2017年8月23日星期三 UTC+8下午9:16:21,Jason Plurad写道:
There's a recipe for this http://tinkerpop.apache.org/docs/current/recipes/#_maximum_depth

On Wednesday, August 23, 2017 at 3:52:14 AM UTC-4, 李平 wrote:
I want to know ,one person in the janusGraph ,his relational depth,use gremlin

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Versioning the structure of a graph.

Ray Scott <raya...@...>
 

I've seen examples of designing a graph separating structure from state so that its easy to version the state of an entity (vertex) as it's modified through the months and years. However, I've been trying to find a solution to versioning the structure (the edges between the vertices). 

Tinkerpop doesn't support edges linked to edges and you can't add properties on edges with set or list cardinality, so even a basic approach of tagging the edges with a version string is not possible.  

I did think of creating a new edge between the vertices with a different version property on it, but I'd possibly end up with hundreds of edges between two vertices all in the name of versioning the structure. Plus you'd have to bulk update the entire subgraph, so that's not going to happen.

Anyone ever look into this or have any hard-earned wisdom they'd like to depart with? 


Re: Can BulkLoaderVertexProgram also add mixed indexes

mystic m <mita...@...>
 

Thanks Marc, your blog post is helpful.

I started the set-up from scratch but I did replace/added distribution specific jars for hadoop and hbase to be able to interact with maprfs and mapr-db.

Also I was able to get rid of MapR spark-assembly from gremlin CLASSPATH by placing it in hdfs and adding spark-yarn jar to gremlin CLASSPATH. This lets me submit the spark job on yarn. I added the janusgraph-hbase jar and spark-gremlin jars as you have specified in the blog, when spark job starts the jars are copied appropriately to the staging area in hdfs but still I get below listed exception, in last setup I had copied hadoop-gremlin-libs in SPARK_LIB directory across the cluster  to resolve the issue, I am not sure why they are not picked from hdfs directory, I will debug this more tomorrow and post back.

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1

        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)



On Thursday, August 24, 2017 at 12:30:58 AM UTC+5:30, HadoopMarc wrote:
Hi m

You might also try the approach I explained in (also discussed in another thread on this forum):

http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Here I show that you do not need the hadoop/hbase/spark jars of your specific distribution. If you get rid of the MapR spark-assembly you do not need the guava shading. The guava shading might be the cause of the ES problems somehow.

HTH,     Marc

Op woensdag 23 augustus 2017 18:56:32 UTC+2 schreef mystic m:
You are right Jason that ElasticSearchIndex  class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.

I will add few more details about the customizations done in our environment if that helps
  1. To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
  2. In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
  3. Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
  4. Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
  5. Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark  + ES together which needs to be addressed correctly.

I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.

~mbaxi



On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote:
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?

On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi


Re: hey guys ,how to query a person relational depth

Jason Plurad <plu...@...>
 

Use simplePath() to avoid cycles http://tinkerpop.apache.org/docs/current/reference/#simplepath-step

gremlin> graph = JanusGraphFactory.open('inmemory'); g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin
> g.addV().property('name', 'A').as('a').
......1>            addV().property('phone', '110').as('b').
......2>            addV().property('name', 'C').as('c').
......3>            addV().property('phone', '111').as('d').
......4>            addV().property('name', 'E').as('e').
......5>            addV().property('phone', '112').as('f').
......6>            addV().property('name', 'G').as('g').
......7>            addE('hasPhone').from('a').to('b').
......8>            addE('hasPhone').from('c').to('d').
......9>            addE('hasPhone').from('c').to('b').
.....10>            addE('hasPhone').from('e').to('d').
.....11>            addE('hasPhone').from('e').to('f').
.....12>            addE('hasPhone').from('g').to('b').iterate()
gremlin
> g.V().has('name', 'A').repeat(both().simplePath()).emit().path().count(local).max()
11:11:58 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(name = A)]. For better performance, use indexes
==>6



On Wednesday, August 23, 2017 at 10:26:12 PM UTC-4, 李平 wrote:
gremlin> g.addV().property('name', 'A').as('a'). addV().property('phone', '110').as('b'). addV().property('name', 'C').as('c'). addV().property('phone', '111').as('d'). addV().property('name', 'E').as('e'). addV().property('phone', '112').as('f'). addV().property('name', 'G').as('g'). addE('hasPhone').from('a').to('b'). addE('hasPhone').from('c').to('d'). addE('hasPhone').from('c').to('b'). addE('hasPhone').from('e').to('d').
addE('hasPhone').from('e').to('f'). addE('hasPhone').from('g').to('b').iterate()



if I want to know vertex A's relation depth how to write the gremlin commond,   

I  write like this 

g.V().has('userId','1').repeat(__.as("a").out().in().where(neq("a"))).emit().path().count(local).max()



but it seem endless loop,
在 2017年8月23日星期三 UTC+8下午9:16:21,Jason Plurad写道:
There's a recipe for this http://tinkerpop.apache.org/docs/current/recipes/#_maximum_depth

On Wednesday, August 23, 2017 at 3:52:14 AM UTC-4, 李平 wrote:
I want to know ,one person in the janusGraph ,his relational depth,use gremlin


Re: hey guys ,how to query a person relational depth

李平 <lipin...@...>
 

gremlin> g.addV().property('name', 'A').as('a'). addV().property('phone', '110').as('b'). addV().property('name', 'C').as('c'). addV().property('phone', '111').as('d'). addV().property('name', 'E').as('e'). addV().property('phone', '112').as('f'). addV().property('name', 'G').as('g'). addE('hasPhone').from('a').to('b'). addE('hasPhone').from('c').to('d'). addE('hasPhone').from('c').to('b'). addE('hasPhone').from('e').to('d').
addE('hasPhone').from('e').to('f'). addE('hasPhone').from('g').to('b').iterate()



if I want to know vertex A's relation depth how to write the gremlin commond,   

I  write like this 

g.V().has('userId','1').repeat(__.as("a").out().in().where(neq("a"))).emit().path().count(local).max()



but it seem endless loop,
在 2017年8月23日星期三 UTC+8下午9:16:21,Jason Plurad写道:

There's a recipe for this http://tinkerpop.apache.org/docs/current/recipes/#_maximum_depth

On Wednesday, August 23, 2017 at 3:52:14 AM UTC-4, 李平 wrote:
I want to know ,one person in the janusGraph ,his relational depth,use gremlin


Re: [BLOG] Configuring JanusGraph for spark-yarn

HadoopMarc <bi...@...>
 

Hi Joe,

Thanks for reporting back your results and confirming the recipe for CDH. Also, your job execution times seem consistent now with the ones I posted above. As to your question whether these figures make sense: I think the loading part of OLAP jobs with HBaseInputFormat is way too slow and needs attention. At his point you are better of with storing the vertex id's on hdfs, do a RDD mapPartitions on these id's and have each spark executor make a connection to JanusGraph and get the vertices it needs with low delay after warming of all HBase caches (I used this approach with Titan and will probably keep it for a while with JanusGraph).

I do not know which plans the JanusGraph team have with the HBaseInputFormat, but I figure they will wait for the future HBase 2.0.0 release which will hopefully cover a number of relevant features, such as:
https://issues.apache.org/jira/browse/HBASE-14789

Cheers,    Marc

Op dinsdag 22 augustus 2017 17:04:03 UTC+2 schreef Joseph Obernberger:

Hi All - I rebuilt Janusgraph from git with the CDH 5.10.0 libraries (just modified the poms) and using that library created a new graph with 159,103,508 and 278,901,629 edges.  I then manually moved regions around in HBase and did splits across our 5 server cluster into 88 regions.  The original size was 22 regions.  The test (g.V().count()) took 1.2 hours to run with Spark to do a count, and a similar amount of time to do the edge count.  I don’t have an exact number, but it looks like to do it without spark took a similar time.  Honestly, I don't know if this is good or bad! 

I replaced the jar files in the lib directory with jars from CDH and then rebuilt the lib.zip file.  My configuration follows:

#
# Hadoop Graph Configuration
#

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=output
gremlin.hadoop.outputLocation=output

log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn

#
# JanusGraph HBase InputFormat configuration
#

janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=FullSpark
janusgraphmr.ioformat.conf.storage.hbase.region-count=44
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
janusgraphmr.ioformat.conf.storage.cache.db-cache-size = 0.5
zookeeper.znode.parent=/hbase

#
# SparkGraphComputer with Yarn Configuration
#

spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000
spark.cores.max=5 

#
# Relevant configs from spark-defaults.conf
#

spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hbase/conf:
spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088
#spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client

Hope that helps!

-Joe

On 8/21/2017 2:40 AM, liu...@... wrote:
Hey - Joseph,Did your test successed?Can you share your experience for me ? Thx

在 2017年8月15日星期二 UTC+8上午6:17:12,Joseph Obernberger写道:

Marc - thank you for this.  I'm going to try getting the latest version of JanusGraph, and compiling it with our specific version of Cloudera CDH, then run some tests.  Will report back.

-Joe


On 8/13/2017 4:07 PM, HadoopMarc wrote:

Hi Joe,

To shed some more light on the running figures you presented, I ran some tests on my own cluster:

1. I loaded the default janusgraph-hbase table with the following simple script from the console:

graph=JanusGraphFactory.open("conf/janusgraph-hbase.properties")
g = graph.traversal()
m = 1200L
n = 10000L
(0L..<m).each{
        (0L..<n).each{
                v1 = g.addV().id().next()
                v2 = g.addV().id().next()
                g.V(v1).addE('link1').to(g.V(v2)).next()
                g.V(v1).addE('link2').to(g.V(v2)).next()
        }
        g.tx().commit()
}

This scipt runs about 20(?) minutes and results in 24M vertices and edges committed to the graph.

2. I did an OLTP g.V().count() on this graph from the console: 11 minutes first time, 10 minutes second time

3. I ran OLAP jobs on this graph using janusgraph-hhbase in two ways:
    a) with g = graph.traversal().withComputer(SparkGraphComputer)  
    b) with g = graph.traversal().withComputer(new Computer().graphComputer(SparkGraphComputer).workers(10))

the properties file was as in the recipe, with the exception of:
   spark.executor.memory=4096m       # smaller values might work, but the 512m from the recipe is definitely too small
   spark.executor.instances=4
   #spark.executor.cores not set, so default value 1

This resulted in the following running times:
   a) stage 0,1,2 => 12min, 12min, 3s => 24min total
   b) stage 0,1,2 => 18min, 1min, 86ms => 19 min total

Discussion:
  • HBase is not an easy source for OLAP: HBase wants large regions for efficiency (configurable, but typically 2-20GB), while mapreduce inputformats (like janusgraph's HBaseInputFormat) take regions as inputsplits by default. This means that only a few executors will read from HBase unless the HBaseInputFormat is extended to split a region's keyspace into multiple inputsplits. This mismatch between the numbers of regions and spark executors is a potential JanusGraph issue. Examples exist to improve on this, e.g. org.apache.hadoop.hbase.mapreduce.RowCounter

  • For spark stages after stage 0 (reading from HBase), increasing the number of spark tasks with the "workers()" setting helps optimizing the parallelization. This means that for larger traversals than just a vertex count, the parallelization with spark will really pay off.

  • I did not try to repeat your settings with a large number of cores. Various sources discourage the use of spark.executor.cores values larger than 5, e.g. https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/, https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory
Hopefully, these tests provide you and other readers with some additional perspectives on the configuration of janusgraph-hbase.

Cheers,    Marc

Op donderdag 10 augustus 2017 15:40:21 UTC+2 schreef Joseph Obernberger:

Thank you Marc.

I did not set spark.executor.instances, but I do have spark.cores.max set to 64 and within YARN, it is configured to allow has much RAM/cores for our 5 server cluster.  When I run a job on a table that has 61 regions, I see that 43 tasks are started and running on all 5 nodes in the Spark UI (and running top on each of the servers).  If I lower the amount of RAM (heap) that each tasks has (currently set to 10G), they fail with OutOfMemory exceptions.  It still hits one HBase node very hard and cycles through them.  While that may be a reason for a performance issue, it doesn't explain the massive number of calls that HBase receives for a count job, and why using SparkGraphComputer takes so much more time.

Running with your command below appears to not alter the behavior.  I did run a job last night with DEBUG turned on, but it produced too much logging filling up the log directory on 3 of the 5 nodes before stopping. 
Thanks again Marc!

-Joe


On 8/10/2017 7:33 AM, HadoopMarc wrote:
Hi Joe,

Another thing to try (only tested on Tinkerpop, not on JanusGraph): create the traversalsource as follows:

g = graph.traversal().withComputer(new Computer().graphComputer(SparkGraphComputer).workers(100))

With HadoopGraph this helps hdfs files with very large or no partitions to be split across tasks; I did not check the effect yet for HBaseInputFormat in JanusGraph. And did you add spark.executor.instances=10 (or some suitable number) to your config? And did you check in the RM ui or Spark history server whether these executors were really allocated and started?

More later,

Marc

Op donderdag 10 augustus 2017 00:13:09 UTC+2 schreef Joseph Obernberger:

Marc - thank you.  I've updated the classpath and removed nearly all of the CDH jars; had to keep chimera and some of the HBase libs in there.  Apart from those and all the jars in lib.zip, it is working as it did before.  The reason I turned DEBUG off was because it was producing 100+GBytes of logs.  Nearly all of which are things like:

18:04:29 DEBUG org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore - Generated HBase Filter ColumnRangeFilter [\x10\xC0, \x10\xC1)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000
18:04:29 DEBUG org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore - Generated HBase Filter ColumnRangeFilter [\x10\xC2, \x10\xC3)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Do those mean anything to you?  I've turned it back on for running with smaller graph sizes, but so far I don't see anything helpful there apart from an exception about not setting HADOOP_HOME.
Here are the spark properties; notice the nice and small extraClassPath!  :)

Name

Value

gremlin.graph

org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.deriveMemory

false

gremlin.hadoop.graphReader

org.janusgraph.hadoop.formats.hbase.HBaseInputFormat

gremlin.hadoop.graphWriter

org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.graphWriter.hasEdges

false

gremlin.hadoop.inputLocation

none

gremlin.hadoop.jarsInDistributedCache

true

gremlin.hadoop.memoryOutputFormat

org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.outputLocation

output

janusgraphmr.ioformat.conf.storage.backend

hbase

janusgraphmr.ioformat.conf.storage.hbase.region-count

5

janusgraphmr.ioformat.conf.storage.hbase.regions-per-server

5

janusgraphmr.ioformat.conf.storage.hbase.short-cf-names

false

janusgraphmr.ioformat.conf.storage.hbase.table

TEST0.2.0

janusgraphmr.ioformat.conf.storage.hostname

10.22.5.65:2181

log4j.appender.STDOUT

org.apache.log4j.ConsoleAppender

log4j.logger.deng

WARNING

log4j.rootLogger

STDOUT

org.slf4j.simpleLogger.defaultLogLevel

warn

spark.akka.frameSize

1024

spark.app.id

application_1502118729859_0041

spark.app.name

Apache TinkerPop's Spark-Gremlin

spark.authenticate

false

spark.cores.max

64

spark.driver.appUIAddress

http://10.22.5.61:4040

spark.driver.extraJavaOptons

-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m

spark.driver.extraLibraryPath

/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native

spark.driver.host

10.22.5.61

spark.driver.port

38529

spark.dynamicAllocation.enabled

true

spark.dynamicAllocation.executorIdleTimeout

60

spark.dynamicAllocation.minExecutors

0

spark.dynamicAllocation.schedulerBacklogTimeout

1

spark.eventLog.dir

hdfs://host001:8020/user/spark/applicationHistory

spark.eventLog.enabled

true

spark.executor.extraClassPath

/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf:

spark.executor.extraJavaOptions

-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlogback.configurationFile=logback.xml

spark.executor.extraLibraryPath

/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native

spark.executor.heartbeatInterval

100000

spark.executor.id

driver

spark.executor.memory

10240m

spark.externalBlockStore.folderName

spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db

spark.kyroserializer.buffer.max

1600m

spark.master

yarn-client

spark.network.timeout

90000

spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS

host005

spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES

http://host005:8088/proxy/application_1502118729859_0041

spark.scheduler.mode

FIFO

spark.serializer

org.apache.spark.serializer.KryoSerializer

spark.shuffle.service.enabled

true

spark.shuffle.service.port

7337

spark.ui.filters

org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter

spark.ui.killEnabled

true

spark.yarn.am.extraLibraryPath

/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native

spark.yarn.appMasterEnv.CLASSPATH

/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*

spark.yarn.config.gatewayPath

/opt/cloudera/parcels

spark.yarn.config.replacementPath

{{HADOOP_COMMON_HOME}}/../../..

spark.yarn.dist.archives

/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip

spark.yarn.dist.files

/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml

spark.yarn.dist.jars

/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar

spark.yarn.historyServer.address

http://host001:18088

zookeeper.znode.parent

/hbase


-Joe

On 8/9/2017 3:33 PM, HadoopMarc wrote:
Hi Gari and Joe,

Glad to see you testing the recipes for MapR and Cloudera respectively!  I am sure that you realized by now that getting this to work is like walking through a minefield. If you deviate from the known path, the odds for getting through are dim, and no one wants to be in your vicinity. So, if you see a need to deviate (which there may be for the hadoop distributions you use), you will need your mine sweeper, that is, put the logging level to DEBUG for relevant java packages.

This is where you deviated:
  • for Gari: you put all kinds of MapR lib folders on the applications master's classpath (other classpath configs are not visible from your post)
  • for Joe: you put all kinds of Cloudera lib folders on the executors classpath (worst of all the spark-assembly.jar)

Probably, you experience all kinds of mismatches in netty libraries which slows down or even kills all comms between the yarn containers. The philosophy of the recipes really is to only add the minimum number of conf folders and jars to the Tinkerpop/Janusgraph distribution and see from there if any libraries are missing.


At my side, it has become apparent that I should at least add to the recipes:

  • proof of work for a medium-sized graph (say 10M vertices and edges)
  • configs for the number of executors present in the OLAP job (instead of relying on spark default number of 2)

So, still some work to do!


Cheers,    Marc


--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Virus-free. www.avg.com

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can BulkLoaderVertexProgram also add mixed indexes

HadoopMarc <bi...@...>
 

Hi m

You might also try the approach I explained in (also discussed in another thread on this forum):

http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Here I show that you do not need the hadoop/hbase/spark jars of your specific distribution. If you get rid of the MapR spark-assembly you do not need the guava shading. The guava shading might be the cause of the ES problems somehow.

HTH,     Marc

Op woensdag 23 augustus 2017 18:56:32 UTC+2 schreef mystic m:

You are right Jason that ElasticSearchIndex  class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.

I will add few more details about the customizations done in our environment if that helps
  1. To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
  2. In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
  3. Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
  4. Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
  5. Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark  + ES together which needs to be addressed correctly.

I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.

~mbaxi



On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote:
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?

On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi


Re: Can BulkLoaderVertexProgram also add mixed indexes

mystic m <mita...@...>
 

You are right Jason that ElasticSearchIndex  class is in janusgraph-es-0.1.1.jar, also this jar is available in SPARK_EXECUTOR_CLASSPATH on all the nodes, I can see all janusgraph specific jars (lib + plugin folder) in Spark UI Environment tab and also in Yarn logs it gets added to spark classpath.

I will add few more details about the customizations done in our environment if that helps
  1. To enable integration with MapR-DB and mfs, replaced all hadoop/spark/hbase jars bundled with janusgraph plugin with MapR specific jars.
  2. In order to make Bulk Load withSparkGraphComputer work (no mixed indexes), shaded guava plugin in janusgraph-core and janusgraph-hbase-core
  3. Change #2 made Bulk Load run successfully but broke integration with ElasticSearch, even graph = JanusGraphFactory.open('conf/janusgraph-hbase-es.properties') failed with NoClassDefFoundError for ElasticSearchIndex class.
  4. Reverting back to originally bundled jars resolves #3 but breaks Bulk Load
  5. Next I changed janusgraph-hadoop-core pom.xml to comment the test scope for janusgraph-es, which fixed #3 and I was able to execute GraphOfGods example with mixed index, this fix still breaks the Bulk Load (even without mixed index in schema definition.
I know all of above information is too wide in scope to be covered in a single question/discussion, but what I can conclude is that there is some integration issue when we want to use Janusgraph + HBase + Spark  + ES together which needs to be addressed correctly.

I think guava specific conflicts are root to these issues and resolving those correctly is required, If you have any insights to fixing this, please let me know.

~mbaxi



On Wednesday, August 23, 2017 at 6:41:32 PM UTC+5:30, Jason Plurad wrote:
The class org.janusgraph.diskstorage.es.ElasticSearchIndex is in janusgraph-es-0.1.1.jar. If you're getting a NoClassDefFoundError, there's really not much more we can tell you other than be completely certain that the jar is on the appropriate classpath. Did you add janusgraph-*.jar only or did you add all jars in the $JANUSGRAPH_HOME/lib directory?

On Tuesday, August 22, 2017 at 1:28:18 PM UTC-4, mystic m wrote:
Hi,

I am exploring Janusgraph bulk load via SparkGraphComputer, janusgraph has been setup as plugin to tinkerpop server and console, with HBase as underlying storage and Elasticsearch as external index store.
I am running this setup on MapR cluster and had to recompile Janusgraph to resolve guava specific conflicts (shaded guava with relocation).

Next I am trying out the example BulkLoaderVertexProgram code provided in Chapter 33, It works fine till I have composite and vertex centric indexes in my schema, but as soon as I define mixed indexes and execute same code I end up with following exception in my Spark Job in stage 2 of job 1 -

java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.ElasticSearchIndex

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:264)

        at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)

        at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)

        at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)

        at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)

        at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)

        at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)


I have verified that all janusgraph specific jars are in spark executor classpath and mixed indexes work fine with GraphOfGod example.

First I want to understand is it right path to use BulkLoaderVertexProgram be used to add mixed indexes? or should I upload the data and build indexes thereafter?

let me know if any additional info is required to dig deeper.

~mbaxi

6021 - 6040 of 6678