Hello everyone,
is there anyone with experience of running OLAP on an AWS EMR cluster?
I am currently trying to do so, but strange things are happening.
The first one is that the application is not running on the entire cluster, even though I specified both driver and executor parameters on the properties file. Regardless of what I write there, only 2 executors are spawn, while the cluster on which I tried could support at least 90. I can see the jobs on the Hadoop and Spark UI of the cluster, and other properties (such as default parallelism) are correctly read and used on jobs.
Moreover, I seem to have problems in getting the correct output: I started from the properties example that uses CQL, but I do not receive any meaningful answer on queries that I do on the Gremlin console (the data is there, because I am able to query it without Spark). The classic vertex count returns zero, and trying to extract a certain set of properties does not return anything. I saw that the conf shows, as GraphWriter, a NullOutputFormat, so I tried to set the Gyro one in there, but nothing changed, and I am not sure is supported by the rest of the configuration.
Thank you for your help,
Alessandro