REINDEXING Big Graph


Abhay Pandit
 

Hi Team,
Currently I am trying to REINDEX using Hadoop Mapreduce using the reference from Janus document.
https://docs.janusgraph.org/index-management/index-reindexing/#reindex-example-on-mapreduce

I wrote my implementation using Java. Here it is running fine. but it is running on Local mode.
For running on cluster mode I need to pass hadoop configurations but from documentations I am not clear how to pass any external configuration to run on hadoop or on yarn cluster.
If anybody has tried against a big graph like having a Billion of nodes, can you guide me on this?

My Java implementation:
JanusGraph janusGraph = JanusGraphFactory.open(janusConfig);
JanusGraphManagement management;
management = janusGraph.openManagement(); JanusGraphIndex graphIndex = management.getGraphIndex("AddressId");
MapReduceIndexManagement mapReduceIndexManagement = new MapReduceIndexManagement(janusGraph);
ScanMetrics scanMetrics = mapReduceIndexManagement.updateIndex(graphIndex, SchemaAction.REINDEX).get();

janusConfig:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=127.0.0.1
storage.port=9042
storage.keyspace=janusgraph
cache.db-cache = false
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

Console log:
[INFO] 2021-02-17 13:37:55,173 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.LocalJobRunner - {} -
[INFO] 2021-02-17 13:37:56,141 task-1 org.apache.hadoop.mapreduce.Job - {} -  map 14% reduce 0%
[INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Task:attempt_local67526867_0001_m_000035_0 is done. And is in the process of committing
[INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.LocalJobRunner - {} - map
[INFO] 2021-02-17 13:37:57,384 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Task 'attempt_local67526867_0001_m_000035_0' done.
[INFO] 2021-02-17 13:37:57,385 LocalJobRunner Map Task Executor #0 org.apache.hadoop.mapred.Task - {} - Final Counters for attempt_local67526867_0001_m_000035_0: Counters: 16

Thanks,
Abhay


hadoopmarc@...
 

Hi Abhay,

The hadoop client picks up configs from the JVM classpath. So, simply add /etc/hadoop/conf (or some other folder that keeps the hdfs-site.xml and other cluster configs) to your classpath. Never done this myself for the indexing mr jobs, nor seen this on this forum, so you may well encounter further barrriers...

HTH,    Marc