Date   

Re: spark operation janusgraph gremlin python

HadoopMarc <bi...@...>
 


It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:

Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


Re: Unable to connect hbase from Gremlin console of Janus-graph.Temporary failure in storage backend

HadoopMarc <bi...@...>
 

Hi Siju,

OK, the exact steps I indicated point to the HBase standalone mode and that does not help you, because you are using the pseudo-distributed mode.
From your logs it is not clear whether the HBase system and its janusgraph table are in a valid state.

Maybe try the following:
 - use hbase shell to drop the current janusgraph table
 - restart the entire hbase system
 - reopen janusgraph with "conf/janusgraph-hbase.properties": this recreates the default janusgraph table in HBase

HTH,   Marc




Op donderdag 18 juni 2020 16:17:56 UTC+2 schreef Siju Somarajan:

Hi Marc,
Thanks for the reply.
I used the exact same steps.My hbase starts,but from gremlin console,hbase is not connecting.I read it could be  version mismatch between Janusgraph and Hbase.Did trial and error,with different versions.Same error results.

On Tuesday, 16 June 2020 20:35:02 UTC+5:30, HadoopMarc wrote:
Hi Siju,

Is is not apparent from your description: did you also try the exact instructions from

That might provide an alternative setup for comparison.

HTH,   Marc

Op maandag 15 juni 2020 08:47:22 UTC+2 schreef Siju Somarajan:
Hi Nicolas,
Thanks a lot for responding.But I am using JDK 8 only
siju@siju-X555LA:~$ java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

Regards,
Siju

On Saturday, 13 June 2020 01:58:37 UTC+5:30, nic...@... wrote:
it seems that you are using openjdk 9 or more. janusgraph must be use with jdk 8. I do not know if it is the root cause.

regard
Nicolas


Re: How to scan all the nodes using spark (with the hbase backend)

rafi ansari <rafi1...@...>
 

Hi David, can you explain the use of newAPIHadoopRDD in the code and how nullwritable and vertexwritable are being used as parameter?

Regards
Rafi


On Tuesday, July 10, 2018 at 1:53:07 AM UTC+5:30, dv...@... wrote:
HI Jeff,
it worked! And it has been very simple. Below what I did translated into Scala (sorry),
thanks a lot again.
David

private val conf: Configuration = new BaseConfiguration()

conf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph")

conf.setProperty("gremlin.hadoop.graphReader", "org.janusgraph.hadoop.formats.hbase.HBaseInputFormat")

conf.setProperty("gremlin.hadoop.graphWriter", "org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat");

conf.setProperty("janusgraphmr.ioformat.conf.storage.backend", "hbase")

conf.setProperty("janusgraphmr.ioformat.conf.storage.hostname", "snowwhite.fairytales")

conf.setProperty("janusgraphmr.ioformat.conf.storage.hbase.table", "janusgraph")

conf.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

private val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()

private val hadoopConfiguration = ConfUtil.makeHadoopConfiguration(conf)

private val rdd: RDD[(NullWritable, VertexWritable)] =
sparkSession.sparkContext.
newAPIHadoopRDD(
hadoopConfiguration,
hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_READER, classOf[InputFormat[NullWritable, VertexWritable]]).
asInstanceOf[Class[InputFormat[NullWritable, VertexWritable]]],
classOf[NullWritable], classOf[VertexWritable])

rdd.collect().foreach(println(_))



On Friday, 6 July 2018 21:33:43 UTC+2, Jeff Callahan wrote:
It's a thin POJO representation of a Vertex that uses only field types that Spark directly understands - there is a corresponding EdgeDescriptor also.  If I'd had more time, I would have figured out how to make the StarVertex class work in my scenario.  My understanding is StarVertex (and the other Star* classes) largely exists for this purpose but I couldn't quite get it to work in my setup.

So the VertexDescriptor just has 5 fields:

String id;
String label;
List<EdgeDescriptor> inEdges;
List<EdgeDescriptor> outEdges;
Map<String, String> properties;

Obviously this doesn't support the full range of semantics the TinkerPop StarVertex does but it got me unblocked.  EdgeDescriptor is exactly what you'd expect; instead of inEdges and outEdges, it instead has VertexDescriptor fields for its attached vertices.

jeff.

On Friday, July 6, 2018 at 9:57:06 AM UTC-7, dv...@... wrote:
Thanks Jeff,
I'll give it a try. Could you tell me a bit more about this class VertexDescriptor?

On Friday, 6 July 2018 04:27:54 UTC+2, Jeff Callahan wrote:
I probably should have mentioned that I did this with cassandra rather than hbase, hopefully it's still helpful

On Thursday, July 5, 2018 at 7:12:17 PM UTC-7, Jeff Callahan wrote:
Hi -

I recently did this same thing.  Here is the code I used to get it working.  I ran into some IO problems related to the TinkerPop Graph types (StarVertex etc).  I needed to move on to other things so I never investigated deeply enough to understand the root cause, instead writing my own POJO style VertexDescriptor class to work around the problem.  I'm fairly new to spark so it's possible I'm not using best patterns and practices below but it does work.

    // Spark Configuration Options

   
SparkConf sparkConfig = new SparkConf();
    sparkConfig
.set("spark.master", "SPARK_MASTER_HOSTNAME_HERE");
    sparkConfig
.set("spark.driver.maxResultSize", "4g");
    sparkConfig
.set("spark.driver.memory", "4g");
    sparkConfig
.set("spark.executor.memory", "4g");
   
String thisJar = NetworkAnalysisComputer.class.getProtectionDomain().getCodeSource().getLocation().toString();
    sparkConfig
.set("spark.jars", thisJar);
    sparkConfig
.set("spark.driver.userClassPathFirst", "false");
    sparkConfig
.set("spark.executor.userClassPathFirst", "false");
    sparkConfig
.set("spark.cores.max", "16");
   
   
// Hadoop I/O Input Options
   
Configuration hadoopConfig = new PropertiesConfiguration();
    hadoopConfig
.setProperty("janusgraphmr.ioformat.conf.storage.hostname", "CASSANDRA_HOSTNAME_HERE");
    hadoopConfig
.setProperty("janusgraphmr.ioformat.conf.storage.backend", "cassandrathrift");
    hadoopConfig
.setProperty("cassandra.input.partitioner.class", "org.apache.cassandra.dht.Murmur3Partitioner");
    hadoopConfig
.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph");
    hadoopConfig
.setProperty("gremlin.hadoop.graphReader", "org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat");
    hadoopConfig
.setProperty("gremlin.hadoop.graphWriter", "org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat");
   
   
SparkSession spark =
       
SparkSession.builder()
                   
.appName("NetworkAnalysisComputer")
                   
.config(sparkConfig)
                   
.getOrCreate();
   
   
JavaSparkContext context = JavaSparkContext.fromSparkContext(spark.sparkContext());
   
Encoder<VertexDescriptor> vertexDescriptorEncoder = Encoders.bean(VertexDescriptor.class);
   
Encoder<PathDetails> pathDetailsEncoder = Encoders.bean(PathDetails.class);
   
JavaRDD<VertexDescriptor> rdd = new InputFormatRDD().readGraphRDD(hadoopConfig, context).map(v -> new VertexDescriptor(v._2().get()));
   
   
Dataset<VertexDescriptor> typedDataSet = spark.createDataset(JavaRDD.toRDD(rdd), vertexDescriptorEncoder);

Thanks for the pointer to the upcoming Input Format in 0.3, Marc.

Thanks,
jeff.

On Thursday, July 5, 2018 at 6:05:22 AM UTC-7, HadoopMarc wrote:
Hi David,

It seems no one did this before, but HBaseInputFormat really implement the org.apache.hadoop.mapreduce.InputFormat that is required by org.apache.spark.rdd.NewHadoopRDD.

So, I would say, give it a go and come back here if you get stuck.

The easier way using more of TinkerPop is also discussed in:


Cheers,   Marc

Op woensdag 4 juli 2018 10:44:02 UTC+2 schreef dv...@...:
Hi Marc,
I gave a look and that was already under my radar. I wonder if there is a way to by pass completely the tinkerpop layer and to use the input format for getting an RDD[Vertex] only combining the newHadoopAPI and the proper input format.



On Tuesday, 3 July 2018 20:25:50 UTC+2, HadoopMarc wrote:
Hi David,

I do not know, but be sure to check:
  1. http://tinkerpop.apache.org/docs/current/reference/#interacting-with-spark which shows how to get the graph as RDD (unless you really want to get rid of the TinkerPop deps)
  2. the upcoming HBaseTableSnapshotInputFormat in JanusGraph 0.3.0
HTH,    Marc

Op dinsdag 3 juli 2018 16:10:31 UTC+2 schreef dv...@...:
I'm trying to understand how to use Spark and newHadoopAPI using the JG HbaseInputFormat as a starting point for possibly scanning and/or applying a function to all the nodes of a graph by using graph.
I'd like to bypass the Spark OLAP support and trying to access directly the vertexes and the edges directly from Spark.
I think that this could be a good starting point to implement a direct mapping between Spark GraphX and JG as an additional parallel computing platform besides the Tinkerpop's OLAP SPark Computer.
What do you think? Is it possible to have any clue on how to use the newHadoopAPi in combination with the HbaseInputFormat for building and RDD[Vertex]?
Any suggestion would be greatly welcomed.
David


Re: Batch loading -java.lang.IllegalArgumentException

Oleksandr Porunov <alexand...@...>
 

Hi,

Notice that you cannot use automatic schema creation feature and storage.batch-loading=true.
If you enable storage.batch-loading than automatic schema creation feature disables.
So, either disable storage.batch-loading or don't use automatic schema creation (explicitly define your schema).

Best regards,
Oleksandr


Re: Janusgraph - OLAP using Dataproc

Claire F <bobo...@...>
 

Hi Marc,

Thanks a lot for your detailed answer I will give that a try and see if I can get it to work. 
Then I hope I'll find a way to marry all that into my Java code once I get it working with the gremlin console, but that shouldn't bei an issue then.

I am aware that my current config uses Spark locally. However I seem to have misunderstood the documentation, as I thought the Hadoop cluster was still needed for some temporary files, and this is what I thought I'd need Dataproc's Hadoop component as well. Even better If I don't.

Regards and thanks again
Claire

HadoopMarc <bi...@...> schrieb am Fr., 19. Juni 2020, 08:08:

Hi Claire,

As also indicated by Saurabh, your current config runs spark locally on your client node and does not use dataproc at all.

What possibly could work (I never used dataproc myself):
Best wishes,   Marc

Op donderdag 18 juni 2020 20:20:50 UTC+2 schreef Claire F:
Hi Saurabh,

Thanks for your reply. 
I am really specifically looking with setup using Dataproc.

Regards
Claire

SAURABH VERMA <sau...@...> schrieb am Do., 18. Juni 2020, 19:59:
We've set up and janusgraph OLAP with spark-yarn, is that something you are looking for?

Thanks

On Thu, Jun 18, 2020 at 10:39 PM <bo...@...> wrote:
Hi,

We are using Janusgraph (0.5.2) with Scylladb as backend. So far we are only using OLTP capabilities but would now like to also do some more advanced batch processing to create shortcut edges, for example for recommendations. To do that, I would like to use the OLAP features.

Reading the documentation this sounds pretty straightforward, assuming one has a Hadoop cluster up and running. But here comes my problem: I would like to use Dataproc - Google's managed solution for Hadoop and Spark. Unfortunately I couldn't find any further information on how to get those two things playing well together.

Does anyone have any experience, hints or documentation on how to properly configure Janusgraph with Dataproc?

In a very first step, a was trying the following (Java application with embedded Janusgraph)

GraphTraversalSource g = GraphFactory.open("graph.properties").traversal().withComputer(SparkGraphComputer.class);
long count = g.V().count().next();
...
g
.close()

the graph.properties file looking like this

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin
.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output
gremlin
.spark.persistContext=true

# Cassandra
janusgraphmr
.ioformat.conf.storage.backend=cql
janusgraphmr
.ioformat.conf.storage.hostname=myhost
janusgraphmr
.ioformat.conf.storage.port=9042
janusgraphmr
.ioformat.conf.index.search.backend=lucene
janusgraphmr
.ioformat.conf.index.search.directory=/tmp/
janusgraphmr
.ioformat.conf.index.search.hostname=127.0.0.1
cassandra
.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra
.input.widerows=true
# Spark
spark
.master=local[*]
spark
.executor.memory=1g
spark
.serializer=org.apache.spark.serializer.KryoSerializer
spark
.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


If I just run the code like this, without specifying anything else, it just results in nothing happening, and endless log output like these
Code hier eingeben...18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Additionally, I added the hdfs-site extracted from dataproc to my classpath, but that didn't help any.

The same in the OLTP world works like a charm. (of course using a proper query, one not iterating over the whole graph .... :D )

Any hints, ideas, experiences or links are greatly appreciated.

Looking forward to some answers,
Claire

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/7dc9a3f1-82bc-47d5-89a1-5f3d4e21e5cdo%40googlegroups.com.


--
Thanks & Regards,
Saurabh Verma,
India


--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/Fh0ARPasw8s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CADJB8JuGxLsTAv6kJrnKfrry5zjKVZD6yQr6JacWKA5Pq2L%3Dvg%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/Fh0ARPasw8s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/4900585e-5091-43ae-842e-162e9ea94d8do%40googlegroups.com.


Re: spark operation janusgraph gremlin python

Real Life Adventure <srinu....@...>
 

Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <bi...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


spark operations with dymic graphs

Real Life Adventure <srinu....@...>
 

Hi,
               How to do spark traversal on dynamically created graphs created with configuredgraphfactory .
               I dont find any documentation on that.
               i found only with GrapgFactory 
graph = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()
Any help Appreciated.

Thanks,
RLA.



Re: Janusgraph - OLAP using Dataproc

HadoopMarc <bi...@...>
 

Hi Claire,

As also indicated by Saurabh, your current config runs spark locally on your client node and does not use dataproc at all.

What possibly could work (I never used dataproc myself):
Best wishes,   Marc

Op donderdag 18 juni 2020 20:20:50 UTC+2 schreef Claire F:

Hi Saurabh,

Thanks for your reply. 
I am really specifically looking with setup using Dataproc.

Regards
Claire

SAURABH VERMA <sau...@...> schrieb am Do., 18. Juni 2020, 19:59:
We've set up and janusgraph OLAP with spark-yarn, is that something you are looking for?

Thanks

On Thu, Jun 18, 2020 at 10:39 PM <bo...@...> wrote:
Hi,

We are using Janusgraph (0.5.2) with Scylladb as backend. So far we are only using OLTP capabilities but would now like to also do some more advanced batch processing to create shortcut edges, for example for recommendations. To do that, I would like to use the OLAP features.

Reading the documentation this sounds pretty straightforward, assuming one has a Hadoop cluster up and running. But here comes my problem: I would like to use Dataproc - Google's managed solution for Hadoop and Spark. Unfortunately I couldn't find any further information on how to get those two things playing well together.

Does anyone have any experience, hints or documentation on how to properly configure Janusgraph with Dataproc?

In a very first step, a was trying the following (Java application with embedded Janusgraph)

GraphTraversalSource g = GraphFactory.open("graph.properties").traversal().withComputer(SparkGraphComputer.class);
long count = g.V().count().next();
...
g
.close()

the graph.properties file looking like this

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin
.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output
gremlin
.spark.persistContext=true

# Cassandra
janusgraphmr
.ioformat.conf.storage.backend=cql
janusgraphmr
.ioformat.conf.storage.hostname=myhost
janusgraphmr
.ioformat.conf.storage.port=9042
janusgraphmr
.ioformat.conf.index.search.backend=lucene
janusgraphmr
.ioformat.conf.index.search.directory=/tmp/
janusgraphmr
.ioformat.conf.index.search.hostname=127.0.0.1
cassandra
.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra
.input.widerows=true
# Spark
spark
.master=local[*]
spark
.executor.memory=1g
spark
.serializer=org.apache.spark.serializer.KryoSerializer
spark
.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


If I just run the code like this, without specifying anything else, it just results in nothing happening, and endless log output like these
Code hier eingeben...18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Additionally, I added the hdfs-site extracted from dataproc to my classpath, but that didn't help any.

The same in the OLTP world works like a charm. (of course using a proper query, one not iterating over the whole graph .... :D )

Any hints, ideas, experiences or links are greatly appreciated.

Looking forward to some answers,
Claire

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/7dc9a3f1-82bc-47d5-89a1-5f3d4e21e5cdo%40googlegroups.com.


--
Thanks & Regards,
Saurabh Verma,
India


--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/Fh0ARPasw8s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CADJB8JuGxLsTAv6kJrnKfrry5zjKVZD6yQr6JacWKA5Pq2L%3Dvg%40mail.gmail.com.


Re: Edge Filter in Janusgraph when working with Spark

rafi ansari <rafi1...@...>
 

Hi Marc

I had one more question. Currently I am using newAPIHadoopRDD to get the VertexWritable objects like in the sample below. Is it the right approach to working with Janus-graph from Spark or can there be a better approach?

val rdd: RDD[(NullWritable, VertexWritable)] =spark.sparkContext.newAPIHadoopRDD(hadoopConfiguration,hadoopConfiguration.
getClass
(Constants.GREMLIN_HADOOP_GRAPH_READER, classOf[InputFormat[NullWritable, VertexWritable]]).
asInstanceOf
[Class[InputFormat[NullWritable, VertexWritable]]],classOf[NullWritable], classOf[VertexWritable])


The above lines give an RDD as output.

rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.NullWritable, org.apache.tinkerpop.gremlin.hadoop.structure.io.VertexWritable)]

Regards

Rafi

On Friday, June 12, 2020 at 10:02:07 PM UTC+5:30, rafi ansari wrote:
Hi Marc

Thank you so much for the hint.

In continuation to my above code , I had to do the following to filter the edges by label : (Maybe helpful for some other)
val vrtxrdd = rdd.map{case (x,y)=>y.asInstanceOf[VertexWritable]}
val strvrtxrdd  =vrtxrdd.map(x => x.get())    -> this give starvertex
val edgesrdd = vrtxrdd.map(x => x.edges(Direction.BOTH,"label1"))

I will try to add Vertex using rdd foreach/map.

Thanks once again.


On Friday, June 12, 2020 at 12:20:55 AM UTC+5:30, HadoopMarc wrote:
Hi Rafi,

Do you mean that you want to filter the vertices based on the fact whether they have an inEdge or and outEdge with a certain label?

Maybe the storage format is just confusing to you. The RDD only contains StarVertex objects. An edge is stored two times: once in its inVertex and once in its outVertex. You can use the edges() method on the StarVertex to get the edges.

HTH,   Marc

Op donderdag 11 juni 2020 14:38:35 UTC+2 schreef rafi ansari:
Hi All,

I am currently working on using Janusgraph in batch mode using Spark.

I am facing a problem on filtering the edges by label.

Below are the specifications:
Spark = 2.4.5
Janusgraph = 0.5.0

Below is the configuration file for Spark:

conf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph")
conf.setProperty("gremlin.hadoop.graphReader", "org.janusgraph.hadoop.formats.cql.CqlInputFormat")
conf.setProperty("gremlin.hadoop.graphWriter", "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat")
conf.setProperty("spark.cassandra.connection.host", "127.0.0.1")
conf.setProperty("janusgraphmr.ioformat.conf.storage.backend", "cql")
conf.setProperty("janusgraphmr.ioformat.conf.storage.hostname", "127.0.0.1")
conf.setProperty("janusgraphmr.ioformat.conf.storage.port", 9042)
conf.setProperty("janusgraphmr.ioformat.conf.storage.cql.keyspace", "graph_db_1")
conf.setProperty("janusgraphmr.ioformat.conf.index.search.backend", "elasticsearch")
conf.setProperty("janusgraphmr.ioformat.conf.index.search.hostname", "127.0.0.1")
conf.setProperty("janusgraphmr.ioformat.conf.index.search.port", 9200)
conf.setProperty("janusgraphmr.ioformat.conf.index.search.index-name", "graph_1")
conf.setProperty("cassandra.input.partitioner.class","org.apache.cassandra.dht.Murmur3Partitioner")
conf.setProperty("cassandra.input.widerows",true)
conf.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.setProperty("spark.kryo.registrator", "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator")

Below is the Spark code using newAPIHadoopRDD 


val hadoopConfiguration = ConfUtil.makeHadoopConfiguration(conf)

val rdd: RDD[(NullWritable, VertexWritable)] =spark.sparkContext.newAPIHadoopRDD(hadoopConfiguration,hadoopConfiguration.
getClass(Constants.GREMLIN_HADOOP_GRAPH_READER, classOf[InputFormat[NullWritable, VertexWritable]]).
asInstanceOf[Class[InputFormat[NullWritable, VertexWritable]]],classOf[NullWritable], classOf[VertexWritable])

The above lines give an RDD as output.

rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.NullWritable, org.apache.tinkerpop.gremlin.hadoop.structure.io.VertexWritable)]

rdd.map{case (x,y)=>y.asInstanceOf[VertexWritable]}

res17: Array[String] = Array(v[8344], v[12440], v[4336], v[4320], v[4136], v[8416], v[8192], v[4248], v[4344], v[8432], v[12528], v[4096])

From the res17 above, not sure how to filter the edges by labels


TIA

Regards

Rafi


Re: Janusgraph - OLAP using Dataproc

Claire F <bobo...@...>
 

Hi Saurabh,

Thanks for your reply. 
I am really specifically looking with setup using Dataproc.

Regards
Claire

SAURABH VERMA <saurabh...@...> schrieb am Do., 18. Juni 2020, 19:59:

We've set up and janusgraph OLAP with spark-yarn, is that something you are looking for?

Thanks

On Thu, Jun 18, 2020 at 10:39 PM <bobo...@...> wrote:
Hi,

We are using Janusgraph (0.5.2) with Scylladb as backend. So far we are only using OLTP capabilities but would now like to also do some more advanced batch processing to create shortcut edges, for example for recommendations. To do that, I would like to use the OLAP features.

Reading the documentation this sounds pretty straightforward, assuming one has a Hadoop cluster up and running. But here comes my problem: I would like to use Dataproc - Google's managed solution for Hadoop and Spark. Unfortunately I couldn't find any further information on how to get those two things playing well together.

Does anyone have any experience, hints or documentation on how to properly configure Janusgraph with Dataproc?

In a very first step, a was trying the following (Java application with embedded Janusgraph)

GraphTraversalSource g = GraphFactory.open("graph.properties").traversal().withComputer(SparkGraphComputer.class);
long count = g.V().count().next();
...
g
.close()

the graph.properties file looking like this

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin
.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output
gremlin
.spark.persistContext=true

# Cassandra
janusgraphmr
.ioformat.conf.storage.backend=cql
janusgraphmr
.ioformat.conf.storage.hostname=myhost
janusgraphmr
.ioformat.conf.storage.port=9042
janusgraphmr
.ioformat.conf.index.search.backend=lucene
janusgraphmr
.ioformat.conf.index.search.directory=/tmp/
janusgraphmr
.ioformat.conf.index.search.hostname=127.0.0.1
cassandra
.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra
.input.widerows=true
# Spark
spark
.master=local[*]
spark
.executor.memory=1g
spark
.serializer=org.apache.spark.serializer.KryoSerializer
spark
.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


If I just run the code like this, without specifying anything else, it just results in nothing happening, and endless log output like these
Code hier eingeben...18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Additionally, I added the hdfs-site extracted from dataproc to my classpath, but that didn't help any.

The same in the OLTP world works like a charm. (of course using a proper query, one not iterating over the whole graph .... :D )

Any hints, ideas, experiences or links are greatly appreciated.

Looking forward to some answers,
Claire

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/7dc9a3f1-82bc-47d5-89a1-5f3d4e21e5cdo%40googlegroups.com.


--
Thanks & Regards,
Saurabh Verma,
India


--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/Fh0ARPasw8s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CADJB8JuGxLsTAv6kJrnKfrry5zjKVZD6yQr6JacWKA5Pq2L%3Dvg%40mail.gmail.com.


Re: Janusgraph - OLAP using Dataproc

SAURABH VERMA <saurabh...@...>
 

We've set up and janusgraph OLAP with spark-yarn, is that something you are looking for?

Thanks

On Thu, Jun 18, 2020 at 10:39 PM <bobo...@...> wrote:
Hi,

We are using Janusgraph (0.5.2) with Scylladb as backend. So far we are only using OLTP capabilities but would now like to also do some more advanced batch processing to create shortcut edges, for example for recommendations. To do that, I would like to use the OLAP features.

Reading the documentation this sounds pretty straightforward, assuming one has a Hadoop cluster up and running. But here comes my problem: I would like to use Dataproc - Google's managed solution for Hadoop and Spark. Unfortunately I couldn't find any further information on how to get those two things playing well together.

Does anyone have any experience, hints or documentation on how to properly configure Janusgraph with Dataproc?

In a very first step, a was trying the following (Java application with embedded Janusgraph)

GraphTraversalSource g = GraphFactory.open("graph.properties").traversal().withComputer(SparkGraphComputer.class);
long count = g.V().count().next();
...
g
.close()

the graph.properties file looking like this

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin
.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output
gremlin
.spark.persistContext=true

# Cassandra
janusgraphmr
.ioformat.conf.storage.backend=cql
janusgraphmr
.ioformat.conf.storage.hostname=myhost
janusgraphmr
.ioformat.conf.storage.port=9042
janusgraphmr
.ioformat.conf.index.search.backend=lucene
janusgraphmr
.ioformat.conf.index.search.directory=/tmp/
janusgraphmr
.ioformat.conf.index.search.hostname=127.0.0.1
cassandra
.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra
.input.widerows=true
# Spark
spark
.master=local[*]
spark
.executor.memory=1g
spark
.serializer=org.apache.spark.serializer.KryoSerializer
spark
.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


If I just run the code like this, without specifying anything else, it just results in nothing happening, and endless log output like these
Code hier eingeben...18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Additionally, I added the hdfs-site extracted from dataproc to my classpath, but that didn't help any.

The same in the OLTP world works like a charm. (of course using a proper query, one not iterating over the whole graph .... :D )

Any hints, ideas, experiences or links are greatly appreciated.

Looking forward to some answers,
Claire

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/7dc9a3f1-82bc-47d5-89a1-5f3d4e21e5cdo%40googlegroups.com.


--
Thanks & Regards,
Saurabh Verma,
India



Janusgraph - OLAP using Dataproc

bobo...@...
 

Hi,

We are using Janusgraph (0.5.2) with Scylladb as backend. So far we are only using OLTP capabilities but would now like to also do some more advanced batch processing to create shortcut edges, for example for recommendations. To do that, I would like to use the OLAP features.

Reading the documentation this sounds pretty straightforward, assuming one has a Hadoop cluster up and running. But here comes my problem: I would like to use Dataproc - Google's managed solution for Hadoop and Spark. Unfortunately I couldn't find any further information on how to get those two things playing well together.

Does anyone have any experience, hints or documentation on how to properly configure Janusgraph with Dataproc?

In a very first step, a was trying the following (Java application with embedded Janusgraph)

GraphTraversalSource g = GraphFactory.open("graph.properties").traversal().withComputer(SparkGraphComputer.class);
long count = g.V().count().next();
...
g
.close()

the graph.properties file looking like this

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin
.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin
.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin
.hadoop.jarsInDistributedCache=true
gremlin
.hadoop.inputLocation=none
gremlin
.hadoop.outputLocation=output
gremlin
.spark.persistContext=true

# Cassandra
janusgraphmr
.ioformat.conf.storage.backend=cql
janusgraphmr
.ioformat.conf.storage.hostname=myhost
janusgraphmr
.ioformat.conf.storage.port=9042
janusgraphmr
.ioformat.conf.index.search.backend=lucene
janusgraphmr
.ioformat.conf.index.search.directory=/tmp/
janusgraphmr
.ioformat.conf.index.search.hostname=127.0.0.1
cassandra
.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra
.input.widerows=true
# Spark
spark
.master=local[*]
spark
.executor.memory=1g
spark
.serializer=org.apache.spark.serializer.KryoSerializer
spark
.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator


If I just run the code like this, without specifying anything else, it just results in nothing happening, and endless log output like these
Code hier eingeben...18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:39:07.749 [Executor task launch worker for task 3] DEBUG o.j.g.t.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Additionally, I added the hdfs-site extracted from dataproc to my classpath, but that didn't help any.

The same in the OLTP world works like a charm. (of course using a proper query, one not iterating over the whole graph .... :D )

Any hints, ideas, experiences or links are greatly appreciated.

Looking forward to some answers,
Claire


Re: Would you use a JanusGraph-like API as a service?

Chen Wu <cjx...@...>
 

Yes its' an interesting and topic but I can't find any information about this except the IBM Compose.

在 2020年2月29日星期六 UTC+8上午5:17:27,Ryan Stauffer写道:

Just a quick thought experiment to put out to the fine folks in this group...

Would you use a "JanusGraph-as-a-Service" offering that works as follows:
  • You're given WebSocket and HTTP endpoints, which you can use to issue Gremlin traversals from any host language or REST client of your choice.
  • You can easily retrieve summary counts of the data you've stored by label and/or property
  • The underlying infrastructure and storage & search backends are transparent, so you don't manage RAM, disk, compute, storage backups, etc
  • You get SLAs regarding latency and throughput based on common workloads.
The idea is to minimize the hurdles to get started building and USING property graphs at scale.  Right now, it's tricky up and down the stack (node sizing, cluster sizing, pluggable backend selection, etc).  I'm just curious how everyone would feel if you DIDN'T have to think about these questions at all, and simply paid a price based on API calls and the total amount of data you're storing.

Thanks in advance for your thoughts!

Ryan 


Re: Would you use a JanusGraph-like API as a service?

Chen Wu <cjx...@...>
 

Yes its' an interesting and topic but I can't find any information about this except the IBM Compose.

在 2020年2月29日星期六 UTC+8上午5:17:27,Ryan Stauffer写道:

Just a quick thought experiment to put out to the fine folks in this group...

Would you use a "JanusGraph-as-a-Service" offering that works as follows:
  • You're given WebSocket and HTTP endpoints, which you can use to issue Gremlin traversals from any host language or REST client of your choice.
  • You can easily retrieve summary counts of the data you've stored by label and/or property
  • The underlying infrastructure and storage & search backends are transparent, so you don't manage RAM, disk, compute, storage backups, etc
  • You get SLAs regarding latency and throughput based on common workloads.
The idea is to minimize the hurdles to get started building and USING property graphs at scale.  Right now, it's tricky up and down the stack (node sizing, cluster sizing, pluggable backend selection, etc).  I'm just curious how everyone would feel if you DIDN'T have to think about these questions at all, and simply paid a price based on API calls and the total amount of data you're storing.

Thanks in advance for your thoughts!

Ryan 


Re: Unable to connect hbase from Gremlin console of Janus-graph.Temporary failure in storage backend

Siju Somarajan <sij...@...>
 

Hi Marc,
Thanks for the reply.
I used the exact same steps.My hbase starts,but from gremlin console,hbase is not connecting.I read it could be  version mismatch between Janusgraph and Hbase.Did trial and error,with different versions.Same error results.

On Tuesday, 16 June 2020 20:35:02 UTC+5:30, HadoopMarc wrote:
Hi Siju,

Is is not apparent from your description: did you also try the exact instructions from

That might provide an alternative setup for comparison.

HTH,   Marc

Op maandag 15 juni 2020 08:47:22 UTC+2 schreef Siju Somarajan:
Hi Nicolas,
Thanks a lot for responding.But I am using JDK 8 only
siju@siju-X555LA:~$ java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

Regards,
Siju

On Saturday, 13 June 2020 01:58:37 UTC+5:30, nic...@... wrote:
it seems that you are using openjdk 9 or more. janusgraph must be use with jdk 8. I do not know if it is the root cause.

regard
Nicolas


Problem of uniqueness of key

n0b0...@...
 

I'm trying to find a solution of "primary key" for vertex and I found a discussion in 2017


Then I tried on my janusgraph 0.5.2 but found the uniqueness restrain seems not working:

gremlin> graph = JanusGraphFactory.open('conf/docker-env.conf')
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> JanusGraphFactory.drop(graph)
==>null
gremlin> graph = JanusGraphFactory.open('conf/docker-env.conf')
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@4f5c757c
gremlin> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin> nameIndex = mgmt.buildIndex('nameIndex', Vertex.class).addKey(name).unique().buildCompositeIndex()
==>nameIndex
gremlin> mgmt.commit()
==>null
gremlin> graph.addVertex('name', 'huupon')
==>v[4192]
gremlin> graph.addVertex('name', 'huupon')
==>v[4112]
gremlin> 


My backend is cassandra and elasticsearch, does anyone knows what did I miss?


Re: spark operation janusgraph gremlin python

HadoopMarc <bi...@...>
 


One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:

Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.


Re: Batch loading -java.lang.IllegalArgumentException

HadoopMarc <bi...@...>
 

Hi Suriya,


schema.default=none means property keys are not automatically created in the schema. You first have to add the createdAt property to the schema or choose a different schema option.

HTH,    Marc


Op woensdag 17 juni 2020 03:38:01 UTC+2 schreef Suriya.Rajasekar:

Hi, I have configured batch loading (storage.batch-loading=true,schema.default = none,ids.block-size=100000). While importing data, I get the following exception :

java.lang.IllegalArgumentException: Property Key with given name does not exist: createdAt
        at org.janusgraph.graphdb.types.typemaker.DisableDefaultSchemaMaker.makePropertyKey(DisableDefaultSchemaMaker.java:46)
        at org.janusgraph.core.schema.DefaultSchemaMaker.makePropertyKey(DefaultSchemaMaker.java:73)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getOrCreatePropertyKey(StandardJanusGraphTx.java:936)
        at org.janusgraph.graphdb.vertices.AbstractVertex.property(AbstractVertex.java:148)
        at org.janusgraph.core.JanusGraphVertex.property(JanusGraphVertex.java:72)
        at org.janusgraph.graphdb.util.ElementHelper.attachProperties(ElementHelper.java:80)
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:124)
        at com.fujitsu.fnc.mlpce.tetopology.impl.GraphDBService.addVertex(GraphDBService.java:398)
        at com.fujitsu.fnc.mlpce.tetopology.impl.TeTopologyImpl.createNode(TeTopologyImpl.java:408)
        at com.fujitsu.fnc.mlpce.tetopology.impl.GraphTransactionWrapper.createNode(GraphTransactionWrapper.java:30)
        at com.fujitsu.fnc.mlpce.topology.mgr.impl.metaengine.WDMMetaEngineImpl.buildGDBNodes(WDMMetaEngineImpl.java:511)
        at com.fujitsu.fnc.mlpce.topology.mgr.impl.metaengine.WDMMetaEngineImplResyncWorker.call(WDMMetaEngineImplResyncWorker.java:43)
        at com.fujitsu.fnc.mlpce.topology.mgr.impl.metaengine.WDMMetaEngineImplResyncWorker.call(WDMMetaEngineImplResyncWorker.java:18)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748).

And due to this, the total number of vertices and edges populated are lesser than it supposed to be. Why do we get this exception  only in batch loading and any suggestions to resolve?


Batch loading -java.lang.IllegalArgumentException

"Suriya.Rajasekar" <suriya.ja...@...>
 

Hi, I have configured batch loading (storage.batch-loading=true,schema.default = none,ids.block-size=100000). While importing data, I get the following exception :

java.lang.IllegalArgumentException: Property Key with given name does not exist: createdAt
        at org.janusgraph.graphdb.types.typemaker.DisableDefaultSchemaMaker.makePropertyKey(DisableDefaultSchemaMaker.java:46)
        at org.janusgraph.core.schema.DefaultSchemaMaker.makePropertyKey(DefaultSchemaMaker.java:73)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getOrCreatePropertyKey(StandardJanusGraphTx.java:936)
        at org.janusgraph.graphdb.vertices.AbstractVertex.property(AbstractVertex.java:148)
        at org.janusgraph.core.JanusGraphVertex.property(JanusGraphVertex.java:72)
        at org.janusgraph.graphdb.util.ElementHelper.attachProperties(ElementHelper.java:80)
        at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:124)
        at com.fujitsu.fnc.mlpce.tetopology.impl.GraphDBService.addVertex(GraphDBService.java:398)
        at com.fujitsu.fnc.mlpce.tetopology.impl.TeTopologyImpl.createNode(TeTopologyImpl.java:408)
        at com.fujitsu.fnc.mlpce.tetopology.impl.GraphTransactionWrapper.createNode(GraphTransactionWrapper.java:30)
        at com.fujitsu.fnc.mlpce.topology.mgr.impl.metaengine.WDMMetaEngineImpl.buildGDBNodes(WDMMetaEngineImpl.java:511)
        at com.fujitsu.fnc.mlpce.topology.mgr.impl.metaengine.WDMMetaEngineImplResyncWorker.call(WDMMetaEngineImplResyncWorker.java:43)
        at com.fujitsu.fnc.mlpce.topology.mgr.impl.metaengine.WDMMetaEngineImplResyncWorker.call(WDMMetaEngineImplResyncWorker.java:18)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748).

And due to this, the total number of vertices and edges populated are lesser than it supposed to be. Why do we get this exception  only in batch loading and any suggestions to resolve?


spark operation janusgraph gremlin python

Real Life Adventure <srinu....@...>
 

Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

1761 - 1780 of 6656