Date
1 - 4 of 4
Bulk Loading with Spark
Joe Obernberger
Hi All - I'm trying to use spark to do a bulk load, but it's very slow.
The cassandra cluster I'm connecting to is a bare-metal, 15 node cluster. I'm using java code to do the loading using: GraphTraversalSource.addV and Vertex.addEdge in a loop. Is there a better way? Thank you! -Joe -- This email has been checked for viruses by AVG. https://www.avg.com |
|
Joe Obernberger
Should have added - I'm connecting with:
toggle quoted message
Show quoted text
JanusGraph graph = JanusGraphFactory.build() .set("storage.backend", "cql") .set("storage.hostname", "charon:9042, chaos:9042") .set("storage.cql.keyspace", "graph") .set("storage.cql.cluster-name", "JoeCluster") .set("storage.cql.only-use-local-consistency-for-system-operations", "true") .set("storage.cql.batch-statement-size", 256) .set("storage.cql.local-max-connections-per-host", 8) .set("storage.cql.read-consistency-level", "ONE") .set("storage.batch-loading", true) .set("schema.default", "none") .set("ids.block-size", 100000) .set("storage.buffer-size", 16384) .open(); -Joe On 5/20/2022 5:28 PM, Joe Obernberger via lists.lfaidata.foundation wrote:
Hi All - I'm trying to use spark to do a bulk load, but it's very slow. |
|
hadoopmarc@...
Hi Joe,
What is slow? Can you please check the Expero blog series and compare to their reference numbers (per parallel spark task): https://www.experoinc.com/post/janusgraph-nuts-and-bolts-part-1-write-performance Best wishes, Marc |
|
Joe Obernberger
Thank you Marc - something isn't right with my code - debugging.
Right now the graph is 4,339,690 vertices and 15,707,179 edges,
but that took days to build, and is probably 5% of the data. -Joe On 5/22/2022 7:53 AM,
hadoopmarc@... wrote:
Hi Joe, |
|