Date
1 - 9 of 9
Strange behaviors for Janusgraph 0.5.3 on AWS EMR
asivieri@...
Hi Marc,
yes, the deployMode was specified in the Gremlin Console and not in the properties file, as in the Tinkerpop example, so that's way it was not explicit here. I am not sure why EMR would be limiting anything, since any different Spark application spawns more executors. But I am still investigating this, I will compare the entire properties list (which is reported in Spark UI as well), maybe there is something different. For the output folder, yes it is working correctly in a way: I tried executing the CloneVertexProgram and it creates 768 files, all empty... and by zero I mean 0, while any other query (such as valueMap()) returns just nothing. Best regards, Alessandro |
|
hadoopmarc@...
Hi Alessandro,
The executors tab of the spark UI shows the product of spark.executor.instances times spark.executor.cores. I guess spark.executor.instances defaults to one and EMR might limit the number of executor cores? I also won't hurt to explicitly specify spark.submit.deployMode=client assuming EMR allows it. I am not sure whether Gremlin Console needs client mode to have the count results returned. And with a "zero" result in the Gremlin Console did you mean 0 or just ==> ?For having written output on "output", you have to configure the distributed storage, so that "output" is a path on hadoop-hdfs (each executor writes its output to a partition on the distributed storage, so you would have 768 partitions in the output directory). Be aware that TinkerPop uses a bit strange naming in the output directory. Best wishes, Marc Best wishes, Marc |
|
asivieri@...
By the way, if you have any properties file or running example of OLAP that you would like to share, I'd be happy to see something working and compare it to what I am trying to do!
Best regards, Alessandro |
|
asivieri@...
Hi,
here are the properties that I am setting so far (plus the same ones that are set in the TinkerPop example, such as the classpath for the executors and the driver):
On Spark UI I can see a number of tasks for the first job which is the same number of tokens for our Scylla cluster (256 tokens per node * 3 nodes), but only two executors are spawn, even though I tried on a cluster with 96 cores and 768 GB of RAM, which, given the configuration of drivers and executors that you can see in the properties, should allocate a lot more than 2. Moreover, I wrote a dedicated Java application that replicates the first step of the SparkGraphComputer, which is the step where the entire vertex list is read into a RDD, so basically I tried skipping the entire Gremlin console, start a "normal" Spark session as we do in our applications, and then read the entire vertex list from Scylla. In this case the job has the same number of tasks as before, but the number of executors is the correct one that I expected, so it seems to me that something in the Spark context creation performed by Gremlin is limiting this number, so maybe I am missing a configuration. The problem of empty results, however, remained: in this test the RDD in output is completely empty, even though the logs in DEBUG show that it is connecting to the correct keyspace, where there is some data present. There are no exceptions, so I am not sure why we are not reading anything. Am I missing some properties in your opinion/experience? Best regards, Alessandro |
|
hadoopmarc@...
Hi Alessandro,
Yes, please include the properties file. To be clear, you see in the spark web UI: spark.master=yarn spark.executor.instances=12 and only two executors for 700+ tasks show up, while other jobs using the same EMR account spawn tens of executors? Is their any yarn queue you have to specify to get more resources from yarn? It sounds like some limit in yarn RM. Best wishes, Marc |
|
kndoan94@...
Hi Alessandro,
I'm also working through a similar use-case with AWS EMR, but I'm running into some Hadoop class errors. What version of EMR are you using? Additionally, if you could pass along the configuration details in your .properties file, that would be extremely helpful :) Thank you! Ben |
|
asivieri@...
Hi Marc,
the Tinkerpop example works correctly. We are actually using Scylla, and with 256 tokens per node I am getting 768 tasks in the Spark job (which I correctly see listed in the UI). The problems I have are that a) only 2 executors are spawn, which does not make much sense since I have configured executor cores and memory in the properties file and the cluster has resources for more than 2, and b) no data is being transmitted back from the cluster, even though performing similar (limited) queries without Spark produce results. Best regards, Alessandro |
|
hadoopmarc@...
Hi Alessandro,
I assume Amazon EMR uses hadoop-yarn, so you need to specify spark.master = yarn, see: https://tinkerpop.apache.org/docs/current/recipes/#olap-spark-yarn Once you can run the TinkerPop example, you can try and switch to janusgraph. You have to realize that janusgraph does not do a good job (yet) in partitioning the input data from a storage backend. Basically, when using cql, you get the partitions as used by Cassandra. So with 1 or 2 spark partitions, there is no need to fire 90 executors. Best wishes, Marc |
|
asivieri@...
Hello everyone,
is there anyone with experience of running OLAP on an AWS EMR cluster? I am currently trying to do so, but strange things are happening. The first one is that the application is not running on the entire cluster, even though I specified both driver and executor parameters on the properties file. Regardless of what I write there, only 2 executors are spawn, while the cluster on which I tried could support at least 90. I can see the jobs on the Hadoop and Spark UI of the cluster, and other properties (such as default parallelism) are correctly read and used on jobs. Moreover, I seem to have problems in getting the correct output: I started from the properties example that uses CQL, but I do not receive any meaningful answer on queries that I do on the Gremlin console (the data is there, because I am able to query it without Spark). The classic vertex count returns zero, and trying to extract a certain set of properties does not return anything. I saw that the conf shows, as GraphWriter, a NullOutputFormat, so I tried to set the Gyro one in there, but nothing changed, and I am not sure is supported by the rest of the configuration. Thank you for your help, Alessandro |
|