spark operation janusgraph gremlin python


Real Life Adventure <srinu....@...>
 

Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.


HadoopMarc <bi...@...>
 


One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:

Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.


Real Life Adventure <srinu....@...>
 

Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <bi...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


HadoopMarc <bi...@...>
 


It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:

Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


Real Life Adventure <srinu....@...>
 

Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


HadoopMarc <bi...@...>
 

Hi RLA,

To look further into this, the gremlin server yaml config file and the hadoopgraph properties file that the yaml file refers to, are needed.

Best wishes,     Marc

Op maandag 22 juni 2020 10:28:35 UTC+2 schreef Real Life Adventure:

Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


Real Life Adventure <srinu....@...>
 

I have attached cofiguration files.

iam running following commands on gremlin console.


gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[93c754d4-a785-4f27-8676-f0756d8b6a96]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[93c754d4-a785-4f27-8676-f0756d8b6a96] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[cassandra_dev_node1]]
gremlin> g.V().count()
==>0
gremlin> graph1 = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g1 = graph1.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g1.V().count()
Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g1.V().count()] - try increasing the timeout with the :remote command
Type ':help' or ':h' for help.
Display stack trace? [yN]

Thanks,RLA.


On Monday, June 22, 2020 at 6:36:54 PM UTC+5:30, HadoopMarc wrote:
Hi RLA,

To look further into this, the gremlin server yaml config file and the hadoopgraph properties file that the yaml file refers to, are needed.

Best wishes,     Marc

Op maandag 22 juni 2020 10:28:35 UTC+2 schreef Real Life Adventure:
Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


HadoopMarc <bi...@...>
 

To be sure, when you open 'conf/hadoop-graph/read-cql-standalone-cluster.properties' in gremlin console without using gremlin server (no remote session), then the OLAP query with SparkGraphComputer runs fine?

And what means "spark in running state"? There is a sparkcontext and spark UI? Is the first task running? Anything interesting from the spark logs, compared to the run without gremlin server?
Best wishes,    Marc

Op maandag 22 juni 2020 15:26:49 UTC+2 schreef Real Life Adventure:

I have attached cofiguration files.

iam running following commands on gremlin console.


gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[93c754d4-a785-4f27-8676-f0756d8b6a96]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[93c754d4-a785-4f27-8676-f0756d8b6a96] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[cassandra_dev_node1]]
gremlin> g.V().count()
==>0
gremlin> graph1 = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g1 = graph1.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g1.V().count()
Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g1.V().count()] - try increasing the timeout with the :remote command
Type ':help' or ':h' for help.
Display stack trace? [yN]

Thanks,RLA.

On Monday, June 22, 2020 at 6:36:54 PM UTC+5:30, HadoopMarc wrote:
Hi RLA,

To look further into this, the gremlin server yaml config file and the hadoopgraph properties file that the yaml file refers to, are needed.

Best wishes,     Marc

Op maandag 22 juni 2020 10:28:35 UTC+2 schreef Real Life Adventure:
Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


Real Life Adventure <srinu....@...>
 

Thanks marc.


On Monday, June 22, 2020 at 9:25:37 PM UTC+5:30, HadoopMarc wrote:
To be sure, when you open 'conf/hadoop-graph/read-cql-standalone-cluster.properties' in gremlin console without using gremlin server (no remote session), then the OLAP query with SparkGraphComputer runs fine?

And what means "spark in running state"? There is a sparkcontext and spark UI? Is the first task running? Anything interesting from the spark logs, compared to the run without gremlin server?
Best wishes,    Marc

Op maandag 22 juni 2020 15:26:49 UTC+2 schreef Real Life Adventure:
I have attached cofiguration files.

iam running following commands on gremlin console.


gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[93c754d4-a785-4f27-8676-f0756d8b6a96]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[93c754d4-a785-4f27-8676-f0756d8b6a96] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[cassandra_dev_node1]]
gremlin> g.V().count()
==>0
gremlin> graph1 = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g1 = graph1.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g1.V().count()
Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g1.V().count()] - try increasing the timeout with the :remote command
Type ':help' or ':h' for help.
Display stack trace? [yN]

Thanks,RLA.

On Monday, June 22, 2020 at 6:36:54 PM UTC+5:30, HadoopMarc wrote:
Hi RLA,

To look further into this, the gremlin server yaml config file and the hadoopgraph properties file that the yaml file refers to, are needed.

Best wishes,     Marc

Op maandag 22 juni 2020 10:28:35 UTC+2 schreef Real Life Adventure:
Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


Real Life Adventure <srinu....@...>
 

while running gremlin query i can see spark job in console.
iam just getting spark job related logs only. i dont find quey logs in spark.
any help appreciated.
Thanks,
RLA.



On Monday, June 22, 2020 at 9:31:14 PM UTC+5:30, Real Life Adventure wrote:
Thanks marc.


On Monday, June 22, 2020 at 9:25:37 PM UTC+5:30, HadoopMarc wrote:
To be sure, when you open 'conf/hadoop-graph/read-cql-standalone-cluster.properties' in gremlin console without using gremlin server (no remote session), then the OLAP query with SparkGraphComputer runs fine?

And what means "spark in running state"? There is a sparkcontext and spark UI? Is the first task running? Anything interesting from the spark logs, compared to the run without gremlin server?
Best wishes,    Marc

Op maandag 22 juni 2020 15:26:49 UTC+2 schreef Real Life Adventure:
I have attached cofiguration files.

iam running following commands on gremlin console.


gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[93c754d4-a785-4f27-8676-f0756d8b6a96]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[93c754d4-a785-4f27-8676-f0756d8b6a96] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[cassandra_dev_node1]]
gremlin> g.V().count()
==>0
gremlin> graph1 = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g1 = graph1.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g1.V().count()
Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g1.V().count()] - try increasing the timeout with the :remote command
Type ':help' or ':h' for help.
Display stack trace? [yN]

Thanks,RLA.

On Monday, June 22, 2020 at 6:36:54 PM UTC+5:30, HadoopMarc wrote:
Hi RLA,

To look further into this, the gremlin server yaml config file and the hadoopgraph properties file that the yaml file refers to, are needed.

Best wishes,     Marc

Op maandag 22 juni 2020 10:28:35 UTC+2 schreef Real Life Adventure:
Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.


HadoopMarc <bi...@...>
 

While the spark job runs, the spark UI is available at http://localhost:4040
In the spark UI you can go to the executor tab and find the executor stderr logs. In other tabs you will see (maybe empty) lists of jobs and tasks.
The gremlin console itself shows the spark driver logs, including starting of the context and jobs and tasks.

HTH,    Marc

Op maandag 22 juni 2020 18:03:58 UTC+2 schreef Real Life Adventure:

while running gremlin query i can see spark job in console.
iam just getting spark job related logs only. i dont find quey logs in spark.
any help appreciated.
Thanks,
RLA.



On Monday, June 22, 2020 at 9:31:14 PM UTC+5:30, Real Life Adventure wrote:
Thanks marc.


On Monday, June 22, 2020 at 9:25:37 PM UTC+5:30, HadoopMarc wrote:
To be sure, when you open 'conf/hadoop-graph/read-cql-standalone-cluster.properties' in gremlin console without using gremlin server (no remote session), then the OLAP query with SparkGraphComputer runs fine?

And what means "spark in running state"? There is a sparkcontext and spark UI? Is the first task running? Anything interesting from the spark logs, compared to the run without gremlin server?
Best wishes,    Marc

Op maandag 22 juni 2020 15:26:49 UTC+2 schreef Real Life Adventure:
I have attached cofiguration files.

iam running following commands on gremlin console.


gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8182-[93c754d4-a785-4f27-8676-f0756d8b6a96]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[93c754d4-a785-4f27-8676-f0756d8b6a96] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[cassandra_dev_node1]]
gremlin> g.V().count()
==>0
gremlin> graph1 = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g1 = graph1.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g1.V().count()
Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g1.V().count()] - try increasing the timeout with the :remote command
Type ':help' or ':h' for help.
Display stack trace? [yN]

Thanks,RLA.

On Monday, June 22, 2020 at 6:36:54 PM UTC+5:30, HadoopMarc wrote:
Hi RLA,

To look further into this, the gremlin server yaml config file and the hadoopgraph properties file that the yaml file refers to, are needed.

Best wishes,     Marc

Op maandag 22 juni 2020 10:28:35 UTC+2 schreef Real Life Adventure:
Thanks for the reply.

                iam running spark job on 8 core machine and the graph has no data.however the operation  g.V().count()  gets timedout.
                i have tested with different spark configurations still facing same error.

Thanks,
RLA.

On Saturday, June 20, 2020 at 2:16:53 PM UTC+5:30, HadoopMarc wrote:

It all depends on the size of your graph and the query you run. How long does the query run as OLTP query?

OLAP queries do a full graph scan and SparkGraphComputer enables you to parallelize this on a number of spark executors * cores. So, if you have 4 spark executors with 5 cores each, your parallellism factor is 20 and you may hope that your OLAP query runs maybe a factor of 10 faster than its OLTP sister (spark involves a lot of overhead).

Cheers,     Marc

Op vrijdag 19 juni 2020 08:18:36 UTC+2 schreef Real Life Adventure:
Thanks for the reply.
              while running olap query in gremlin console it gets timed out.but i can see spark job in running state.
              even i increased timeout to large value.however it gets timedout.
              any help appreciated.
              once again thanks.
Thanks,
RLA.
             

On Thu, 18 Jun 2020 at 01:12, HadoopMarc <b...@...> wrote:

One of the reasons this is not documented is that OLAP queries can soon occupy all Gremlin Server resources after which it would become unresponsive. Only documentation seems to be the source code: https://github.com/apache/tinkerpop/blob/3.4.7/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py

I would try the following steps:
  • first make sure you can make an OLAP query with SparkGraphComputer from a remote connection in the gremlin console
  • move over to the python console and make the remote connection
  • simply try g.withComputer("SparkGraphComputer").V().limit(10).toList()
If there would appear unhelpful error messages, please come back. You can also compare the bytecode generated in the gremlin console and in the python repl by simply ending the traversals with the .bytecode field indicator.

HTH,    Marc


Op dinsdag 16 juni 2020 19:53:32 UTC+2 schreef Real Life Adventure:
Hi,
            How to achieve graph traversal operations from  gremlin python with spark graph computer.
            I don't find examples for spark operations with gremlin python
            any help appreciated.

Thanks,
RLA.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janu...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5880eb3a-6d0b-4bbd-b6e3-1fe1062442d9o%40googlegroups.com.