|
Write vertex program output to HDFS using SparkGraphComputer
Hi All,
I am running SparkGraphComputer using tinker-pop library on Yarn. I am able to run vertex program successfully and write final output to a mount location. But I a want to make program to write
Hi All,
I am running SparkGraphComputer using tinker-pop library on Yarn. I am able to run vertex program successfully and write final output to a mount location. But I a want to make program to write
|
By
anjanisingh22@...
·
#6560
·
|
|
Re: Nodes with lots of edges
Hi Joe,
You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see:
Hi Joe,
You do not describe whether breaking this rule of thumb causes real performance issues in your case. Anyway, JanusGraph allows you to partition the stored edges of a node, see:
|
By
hadoopmarc@...
·
#6559
·
|
|
Nodes with lots of edges
I've noticed that the max partition size on Cassandra can get extremely large if you have a node with lots of edges. The max partition size on the edgestore table on a graph I'm working on is over
I've noticed that the max partition size on Cassandra can get extremely large if you have a node with lots of edges. The max partition size on the edgestore table on a graph I'm working on is over
|
By
Joe Obernberger
·
#6558
·
|
|
Re: Graph corruption?
Thank you Kevin and Boxuan for the help on this. I was scratching my head on this and decided to blow away the graph and try again. Every-time, I built a small graph and exported to
Thank you Kevin and Boxuan for the help on this. I was scratching my head on this and decided to blow away the graph and try again. Every-time, I built a small graph and exported to
|
By
Joe Obernberger
·
#6557
·
|
|
Re: Getting Edges - Performance
Hi Joe,
> The java code talks directly to cassandra via the Janusgraph library. Should I be using a Gremlin server instead?
I see, so you are using embedded JanusGraph. It has less overhead compared
Hi Joe,
> The java code talks directly to cassandra via the Janusgraph library. Should I be using a Gremlin server instead?
I see, so you are using embedded JanusGraph. It has less overhead compared
|
By
Boxuan Li
·
#6556
·
|
|
Re: Graph corruption?
Hi Joe,
I just wanted to check a few things:
1. Did you happen to enable `storage.batch-loading`? See https://docs.janusgraph.org/operations/bulk-loading/#batch-loading
2. IIRC you are using
Hi Joe,
I just wanted to check a few things:
1. Did you happen to enable `storage.batch-loading`? See https://docs.janusgraph.org/operations/bulk-loading/#batch-loading
2. IIRC you are using
|
By
Boxuan Li
·
#6555
·
|
|
Re: Graph corruption?
Joe,
See https://groups.google.com/g/janusgraph-users/c/foaqfG-MB5E/m/tsNnkhPtBwAJ for a discussion on perhaps the same issue you are running into.
Long story short, it appears you have a unique
Joe,
See https://groups.google.com/g/janusgraph-users/c/foaqfG-MB5E/m/tsNnkhPtBwAJ for a discussion on perhaps the same issue you are running into.
Long story short, it appears you have a unique
|
By
Kevin Schmidt
·
#6554
·
|
|
Graph corruption?
Hi all - I'm seeing this from a recent graph I built:
gremlin> :> g.V(4162).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4146).valueMap()
==>{source=[DS_106],
Hi all - I'm seeing this from a recent graph I built:
gremlin> :> g.V(4162).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4146).valueMap()
==>{source=[DS_106],
|
By
Joe Obernberger
·
#6553
·
|
|
Re: Getting Edges - Performance
Hi Boxuan -
Cluster is a 15 node cassandra cluster:
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/
Hi Boxuan -
Cluster is a 15 node cassandra cluster:
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/
|
By
Joe Obernberger
·
#6552
·
|
|
Re: Threaded Operations - Quarkus
Gotcha. The recommended way is to use TinkerPop API to do CRUD operations, i.e. GraphTraversalSource.addV() is recommended as it is considered the “standard” way.
The `addV` method inherently
Gotcha. The recommended way is to use TinkerPop API to do CRUD operations, i.e. GraphTraversalSource.addV() is recommended as it is considered the “standard” way.
The `addV` method inherently
|
By
Boxuan Li
·
#6551
·
|
|
Re: Getting Edges - Performance
Your Cassandra statistics shows pretty low latency, so the problem does not seem to be related to Cassandra. I would check if it’s a network problem. It might also be helpful if you could provide
Your Cassandra statistics shows pretty low latency, so the problem does not seem to be related to Cassandra. I would check if it’s a network problem. It might also be helpful if you could provide
|
By
Boxuan Li
·
#6550
·
|
|
Re: Getting Edges - Performance
One thing of note is the tablehistogram for the graphindex table:
nodetool tablehistograms graphsource.graphindex
graphsource/graphindex histograms
Percentile Read
One thing of note is the tablehistogram for the graphindex table:
nodetool tablehistograms graphsource.graphindex
graphsource/graphindex histograms
Percentile Read
|
By
Joe Obernberger
·
#6549
·
|
|
Re: Poor load balancing in 0.5.3
I suspect this is an issue with gremlin java-driver. I just reported here: https://issues.apache.org/jira/browse/TINKERPOP-2766
I believe you should provide all hostnames for now.
Best regards,
Boxuan
I suspect this is an issue with gremlin java-driver. I just reported here: https://issues.apache.org/jira/browse/TINKERPOP-2766
I believe you should provide all hostnames for now.
Best regards,
Boxuan
|
By
Boxuan Li
·
#6548
·
|
|
Re: Getting Edges - Performance
Thank you Boxuan - the code (REST service) that is modifying the graph is being called continuously when running. A slow example looks like this:
Metrics: Traversal Metrics
Step
Thank you Boxuan - the code (REST service) that is modifying the graph is being called continuously when running. A slow example looks like this:
Metrics: Traversal Metrics
Step
|
By
Joe Obernberger
·
#6547
·
|
|
Re: Getting Edges - Performance
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID", edgeID).profile().next()
and paste the output (when
Profiler documentation is available here: https://tinkerpop.apache.org/docs/current/reference/#profile-step
Can you do
traversal.E().has("edgeID", edgeID).profile().next()
and paste the output (when
|
By
Boxuan Li
·
#6546
·
|
|
Re: Getting Edges - Performance
Looking for documentation on how to do profile() - do you have any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID",
Looking for documentation on how to do profile() - do you have any?
Queries like this are also slow:
Edge dataSourceToCorrelationEdge = traversal.E().has("edgeID",
|
By
Joe Obernberger
·
#6545
·
|
|
Poor load balancing in 0.5.3
Hi folks,
From what we are seeing in the gremlin driver code, it is expected that all gremlin server host names are provided and not a VIP. Is that correct?
For example,
Cluster cluster =
Hi folks,
From what we are seeing in the gremlin driver code, it is expected that all gremlin server host names are provided and not a VIP. Is that correct?
For example,
Cluster cluster =
|
By
Doug Whitfield
·
#6544
·
|
|
Re: Threaded Operations - Quarkus
Sorry for the delay.
When a request comes in via REST, Quarkus creates a thread to handle it; I believe it actually comes from a thread pool.
This code now does:
Sorry for the delay.
When a request comes in via REST, Quarkus creates a thread to handle it; I believe it actually comes from a thread pool.
This code now does:
|
By
Joe Obernberger
·
#6543
·
|
|
Re: Getting Edges - Performance
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide theprofile() output here when you see such slowness? It is also worth trying if other queries (e.g. loading a
It's very suspicious. It shouldn't take 3 seconds just to load 3 edges. Can you provide theprofile() output here when you see such slowness? It is also worth trying if other queries (e.g. loading a
|
By
Boxuan Li
·
#6542
·
|
|
Re: Threaded Operations - Quarkus
When you say use JanusGraph.tx().createdThreadedTx() directly, what do you mean? Can you give an example?
When you say use JanusGraph.tx().createdThreadedTx() directly, what do you mean? Can you give an example?
|
By
Boxuan Li
·
#6541
·
|