Date   

Re: ID block allocation exception while creating edge

hadoopmarc@...
 

Hi Anjani,

One thing that does not feel good is that you create and commit a transaction for every row of your dataframe. Although I do not see how this would interfere with ID allocation, best practice is to have partitions of about 10.000 vertices/edges and commit these as one batch. In case of an exception, you rollback the transaction and raise your own exception. After that, Spark will retry the partition and your job will still succeed. It is worth a atry.

Best wishes,    Marc


Re: Making janus graph client to not use QUORUM

anjanisingh22@...
 

Thanks Marc, i will try that option.


Re: ID block allocation exception while creating edge

anjanisingh22@...
 

Sharing detail on how i am creating node/edges to make sure nothing wrong with that which is resulting in ID allocation failures.

 

I am creating one static instance JanusGraph object on each spark worker box and using that i am creating multiple transaction and commit.

pairRDD.foreachPartition(partIterator -> {
partIterator.forEachRemaining( tuple -> {
createNodeAndEdge(tuple, JanusGraphConfig.getJanusGraph(janusConfig));
});
}); where JanusGraphConfig.getJanusGraph returns static instance.

 

In createNodeAndEdge() method i am creating GraphTraversalSource using static janusGraph, creating node, edge, committing and then closing GraphTraversalSource object, as shown below in pseudo code:

createNodeAndEdge(Tuple2<K, V> pair, JanusGraph janusGraph)

{

GraphTraversalSource g = janusGraph.buildTransaction().start().traversal();
 try{

      create node;

      create edge;
     
      g.tx().commit();

    }  catch ( Exception) {

     g.tx().rollback();
  } finally() {
    g.tx().close();

    g.close();
  }

}

 

Thanks,
Anjani


Re: ID block allocation exception while creating edge

anjanisingh22@...
 

Thanks for response Marc. Yes i also think for some reason changes are not getting picked up but not able to figure out why so.

ids.block-size is updated in config file of all janus nodes and after that all nodes are re-started. 

In code i have only one method which is used to create janus-instance and same is passed to method for node/edge creation.

Yes 
IDS_BLOCK_SIZE  is equals "ids.block-size".

Thanks,
Anjani


Re: ID block allocation exception while creating edge

hadoopmarc@...
 

Hi Anjani,

It is still most likely that the modified value of "ids.block-size" somehow does not come through. So, are you sure that
  • all JanusGraph instances are closed before using the new value ("ids.block-size" has GLOBAL_OFFLINE mutability level). Safest is to have a fresh keyspace and one location for the properties to be used for both graph creation and bulk loading.
  • sorry for asking: does IDS_BLOCK_SIZE  equals "ids.block-size"
Best wishes,    Marc


Re: ID block allocation exception while creating edge

anjanisingh22@...
 

Hi Marc,

I tried setting ids.num-partitions = number of executors through code not directly in janus global config files but no luck. Added below properties but it didn't helped.
configProps.set("ids.renew-timeout", "240000");
configProps.set("ids.renew-percentage", "0.4");
configProps.set("ids.num-partitions", "253");

Thanks,
Anjani


Re: MapReduce reindexing with authentication

Boxuan Li
 

Hi Marc,

That is an interesting solution. I was not aware of the mapreduce.application.classpath property. It is not well documented, but from what I understand, this option is used primarily to distribute the mapreduce framework rather than user files. Glad to know it can be used for user files as well.
I am not 100% sure, but seems it requires you to upload the file to hdfs first (if you are using a yarn cluster). The ToolRunner, however, can add a file from local filesystem too. We prefer not to store keytab files on hdfs permanently. This difference is subtle, though. Also, we don’t use gremlin console anyway, so not being able to do so via gremlin console is not a drawback for us.

Agree with you that the documentation can be enhanced. Right now it simply says “The class starts a Hadoop MapReduce job using the Hadoop configuration and jars on the classpath.”, which is too brief and assumes users have a good knowledge of Hadoop MapReduce.

> One could even think of putting the mapreduce properties in the graph properties file and pass on properties of this namespace to the mapreduce client.

Not sure if it’s possible, but if someone implements it, it would be very helpful for users to do quick start without worrying about the cumbersome Hadoop configs.

Best regards,
Boxuan

「<hadoopmarc@...>」在 2021年5月24日 週一,下午3:48 寫道:

Hi Boxuan,

Yes, you are right, I mixed things up by wrongly interpreting GENERIC_OPTIONS as an env variable. I did some additional experiments. though, bringing in new information.

1. It is possible to put a mapred-site.xml file on the JanusGraph classpath that is automatically loaded by the mapreduce client. When using the file below during mapreduce reindexing, I get the following exception (on purpose):

gremlin> mr.updateIndex(i, SchemaAction.REINDEX).get()
java.io.FileNotFoundException: File file:/tera/lib/janusgraph-full-0.5.3/hi.tgz does not exist

The mapreduce config parameters are listed in https://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
The description for mapreduce.application.framework.path suggests that you can pass additional files to the mapreduce workers using this option (without any changes to JanusGraph).

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>local</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>dummy</value>
  </property>
  <property>
    <name>mapreduce.application.framework.path</name>
    <value>hi.tgz</value>
  </property>
  <property>
    <name>mapred.map.tasks</name>
    <value>2</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>2</value>
  </property>
</configuration>

2. When using mapreduce reindexing in the documented way, it already issues the following warning:
08:49:55 WARN  org.apache.hadoop.mapreduce.JobResourceUploader  - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

When you would resolve your keytab issue by modifying the JanusGraph code and calling the hadoop ToolRunner, you have the additional advantage of getting rid of this warning. This would not work from the gremlin console, though, unless gremlin.sh passes the additional command line options to the java command line (ugly).

So, I think I would prefer the option with mapred-site.xml. It would not hurt to slightly extend the mapreduce reindexing documentation, anyway:
  • when calling from the gremlin console, you need an "import org.janusgraph.hadoop.MapReduceIndexManagement"
  • mapreduce has a default setting mapreduce.framework.name=local. Where do you set mapreduce.framework.name=yarn for using your cluster? One could even think of putting the mapreduce properties in the graph properties file and pass on properties of this namespace to the mapreduce client.
Best wishes,    Marc


Re: MapReduce reindexing with authentication

hadoopmarc@...
 

Hi Boxuan,

Yes, you are right, I mixed things up by wrongly interpreting GENERIC_OPTIONS as an env variable. I did some additional experiments. though, bringing in new information.

1. It is possible to put a mapred-site.xml file on the JanusGraph classpath that is automatically loaded by the mapreduce client. When using the file below during mapreduce reindexing, I get the following exception (on purpose):

gremlin> mr.updateIndex(i, SchemaAction.REINDEX).get()
java.io.FileNotFoundException: File file:/tera/lib/janusgraph-full-0.5.3/hi.tgz does not exist

The mapreduce config parameters are listed in https://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
The description for mapreduce.application.framework.path suggests that you can pass additional files to the mapreduce workers using this option (without any changes to JanusGraph).

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>local</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>dummy</value>
  </property>
  <property>
    <name>mapreduce.application.framework.path</name>
    <value>hi.tgz</value>
  </property>
  <property>
    <name>mapred.map.tasks</name>
    <value>2</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>2</value>
  </property>
</configuration>

2. When using mapreduce reindexing in the documented way, it already issues the following warning:
08:49:55 WARN  org.apache.hadoop.mapreduce.JobResourceUploader  - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

When you would resolve your keytab issue by modifying the JanusGraph code and calling the hadoop ToolRunner, you have the additional advantage of getting rid of this warning. This would not work from the gremlin console, though, unless gremlin.sh passes the additional command line options to the java command line (ugly).

So, I think I would prefer the option with mapred-site.xml. It would not hurt to slightly extend the mapreduce reindexing documentation, anyway:
  • when calling from the gremlin console, you need an "import org.janusgraph.hadoop.MapReduceIndexManagement"
  • mapreduce has a default setting mapreduce.framework.name=local. Where do you set mapreduce.framework.name=yarn for using your cluster? One could even think of putting the mapreduce properties in the graph properties file and pass on properties of this namespace to the mapreduce client.
Best wishes,    Marc


Re: Making janus graph client to not use QUORUM

hadoopmarc@...
 

Hi Anjani,

To see what exactly happens with local configurations, I did the following:
  • from the binary janusgraph distribution I started janusgraph with "bin/janusgraph.sh start" (this implicitly uses conf/janusgraph-cql-es.properties)
  • I made a copy of conf/janusgraph-cql-es.properties in which I added your storage.cql.read-consistency-level=LOCAL_ONE
  • In gremlin console I ran the code below (using JanusGraph in an embedded way, no remote connection):
graph = JanusGraphFactory.open('conf/janusgraph-cql-es-local-one.properties')
conf = graph.getConfiguration().getLocalConfiguration()
ks = conf.getKeys(); null;
while (ks.hasNext()) {
  k = ks.next()
  System.out.print(String.format("%30s: %s\n", k, conf.getProperty(k)))
}
With printed output:
              storage.hostname: 127.0.0.1
storage.cql.read-consistency-level: LOCAL_ONE
                cache.db-cache: true
          storage.cql.keyspace: janusgraph
               storage.backend: cql
         index.search.hostname: 127.0.0.1
           cache.db-cache-size: 0.25
                 gremlin.graph: org.janusgraph.core.JanusGraphFactory

Can you do the same printing of configurations on the client that shows the exception about the QUORUM?
In this way, we can check whether the problem is in your code or in JanusGraph not properly passing the  local configurations.

Best wishes,    Marc


Re: Backend data model deserialization

sauverma
 

Hi Elliot

At zeotap we ve taken the same route to enable olap consumers via apache spark. We presented it in the recent janusgraph meet-up at https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376. We are using ScyllaDB as the backend.

Let's get in touch if this aligns with your requirements.

Thanks 


On Thu, May 20, 2021, 6:12 PM Boxuan Li <liboxuan@...> wrote:
Hi Elliot,

I am not aware of existing utilities for deserialization, but as Marc has suggested, you might want to see if there are old Titan resources regarding it, since the data model hasn’t been changed since Titan -> JanusGraph migration.

If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:

final Entry colVal = StaticArrayEntry.of(StaticArrayBuffer.of(Bytes.fromHexString("0x70a0802140803800")), StaticArrayBuffer.of(Bytes.fromHexString("0x0180a076616c75e5"))); // I retrieved this hex string from Cassandra cqlsh console
final StandardSerializer serializer = new StandardSerializer();
final EdgeSerializer edgeSerializer = new EdgeSerializer(serializer);
RelationCache edgeCache = edgeSerializer.readRelation(colVal, false, (StandardJanusGraphTx) tx); // this is the deserialized edge

Bytes.fromHexString is an utility method provided by datastax cassandra driver. You might use any other library/code to convert hex string to bytes.

As you can see, there is no single easy-to-use API to deserialize raw data. If you end up creating one, I think it would be helpful if you could contribute back to the community.

Best regards,
Boxuan


On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:

Hi Elliot,

There should be some old Titan resources that describe how the data model is binary coded into the row keys and row values. Of course, it is also implicit from the JanusGraph source code.

If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project.

Best wishes,     Marc


Re: Backend data model deserialization

Boxuan Li
 

Hi Elliot,

I am not aware of existing utilities for deserialization, but as Marc has suggested, you might want to see if there are old Titan resources regarding it, since the data model hasn’t been changed since Titan -> JanusGraph migration.

If you want to resort to the source code, you could check out EdgeSerializer and IndexSerializer. Here is a simple code snippet demonstrating how to deserialize an edge:

final Entry colVal = StaticArrayEntry.of(StaticArrayBuffer.of(Bytes.fromHexString("0x70a0802140803800")), StaticArrayBuffer.of(Bytes.fromHexString("0x0180a076616c75e5"))); // I retrieved this hex string from Cassandra cqlsh console
final StandardSerializer serializer = new StandardSerializer();
final EdgeSerializer edgeSerializer = new EdgeSerializer(serializer);
RelationCache edgeCache = edgeSerializer.readRelation(colVal, false, (StandardJanusGraphTx) tx); // this is the deserialized edge

Bytes.fromHexString is an utility method provided by datastax cassandra driver. You might use any other library/code to convert hex string to bytes.

As you can see, there is no single easy-to-use API to deserialize raw data. If you end up creating one, I think it would be helpful if you could contribute back to the community.

Best regards,
Boxuan


On May 20, 2021, at 8:07 PM, hadoopmarc@... wrote:

Hi Elliot,

There should be some old Titan resources that describe how the data model is binary coded into the row keys and row values. Of course, it is also implicit from the JanusGraph source code.

If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project.

Best wishes,     Marc


Re: Backend data model deserialization

hadoopmarc@...
 

Hi Elliot,

There should be some old Titan resources that describe how the data model is binary coded into the row keys and row values. Of course, it is also implicit from the JanusGraph source code.

If you look back at this week's OLAP presentations (https://lists.lfaidata.foundation/g/janusgraph-users/topic/janusgraph_meetup_4/82939376) you will see that one of the presenters exactly did what you propose: they exported rows from scylladb and converted it to gryo format for import into TinkerPop HadoopGraph. You might want to contact them to coordinate a possible contribution to the JanusGraph project.

Best wishes,     Marc


Re: MapReduce reindexing with authentication

Boxuan Li
 

Hi Marc,

Thanks for your explanation. Just to avoid confusion, GENERIC_OPTIONS itself is not an env variable, but a set of configuration options (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options). These options have nothing to do with environment variables.

If I understand you correctly, you are saying that maybe ToolRunner interface is not required to submit files. I didn’t try but I think you are right, because what it does under the hood is simply

if (line.hasOption("files")) {
conf.set("tmpfiles", this.validateFiles(line.getOptionValue("files"), conf), "from -files command line option");
}
Which will later be picked up by hadoop client. So, theoretically, ToolRunner is not needed, and one can set hadoop config by themselves. This, however, does not seem to be documented officially anywhere, and it is not guaranteed that this string literal “tmpfiles” will not change in future versions.

Note that even if one wants to set “tmpfiles” by themselves for MapReduce reindex, they still need to modify JanusGraph source code because currently hadoopConf object is created within MapReduceIndexManagement class and users have no control over it.

Best regards,
Boxuan

On May 15, 2021, at 4:53 PM, hadoopmarc@... wrote:

Hi Boxuan,

Yes, I did not finish my argument. What I tried to suggest: if the hadoop CLI command checks the GENERIC_OPTIONS env variable, then maybe also the mapreduce java client called by JanusGraph checks the GENERIC_OPTIONS env variable.

The (old) blog below suggests, however, that this behavior is not present by default but requires the janusgraph code to run hadoop's ToolRunner. So, just see if this is any better than what you had in mind to implement.
https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver/

Best wishes,    Marc


Backend data model deserialization

Elliot Block <eblock@...>
 

Hello,

Is there any supported way (e.g. a class/API) for deserializing raw data model rows, i.e. to get from raw Bigtable bytes to Vertex/edge list objects (in Java)?

https://docs.janusgraph.org/advanced-topics/data-model/

We're on the Cloud Bigtable storage backend, and it has excellent support for bulk exporting Bigtable rows (e.g. to Parquet in GCS), but we're unclear how to deserialize the raw Bigtable row/cell bytes back into usable Vertex objects.  If we were to build support for something like this, would it be a candidate for contribution back into the project?  Or is it misunderstanding the intended API/usage path?

Any thoughts greatly appreciated.  Thank you!

- Elliot


JanusGraph Meetup #4 Recording

Ted Wilmes
 

Hello,
Thanks to all who attended the meetup yesterday. If you weren't able to make it, you can find the recording at: https://www.experoinc.com/online-seminar/janusgraph-community-meetup.

Thanks to our presenters: Marc, Saurabh, and Bruno, we had a really good set of material presented.

Thanks,
Ted


Re: Query Optimisation

hadoopmarc@...
 

Hi Vinayak,

Please study the as(), select(), project() and cap() steps from the TinkerPop ref docs. The arguments of project() do not reference the keys of side effects but rather introduce new keys for its output. The query I provided above was tested in the TinkerPop modern graph, so I repeat my suggestion to try that one first and show in what way it fails to provide sensible output.

Best wishes,    Marc


Re: Query Optimisation

Vinayak Bali
 

Hi Marc, 

I am using the following query now.

g2.inject(1).union(
V().has('property1', 'V1').aggregate('v1').outE().has('property1', 'E1').limit(100).aggregate('e').inV().has('property2', 'V2').aggregate('v2')
).project('v1','e','v2').by(valueMap().by(unfold()))

But this only returns the elements of V2. No V1 and E1 attributes are returned. Can you please check ??

Thanks & Regards,
Vinayak


On Mon, May 10, 2021 at 8:13 PM <hadoopmarc@...> wrote:
Hi Vinayak,

Actually, query 4 was easier to rework. It could read somewhat like:
g.V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge1').limit(100).as('e').inV().has('property1', 'vertex1').as('v2').
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  V().has('property1', 'vertex1').as('v1').outE().has('property1', 'edge2').limit(100).as('e').inV().has('property1', 'vertex2').as('v2').
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  V().has('property1', 'vertex3').as('v1').outE().has('property1', 'edge3').limit(100).as('e').inV().has('property1', 'vertex2').as('v2').
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  V().has('property1', 'vertex3').as('v1').outE().has('property1', 'Component_Of').limit(100).as('e').inV().has('property1', 'vertex1').as('v2')).
    select('v1','e','v2').by(valueMap().by(unfold())).aggregate('x').fold().
  cap('x')

Best wishes,    Marc


Making janus graph client to not use QUORUM

anjanisingh22@...
 

Hi All,

I am trying to create nodes in graph and while reading created node id i am getting below exception:

org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at io.vavr.API$Match$Case0.apply(API.java:3174)

at io.vavr.API$Match.of(API.java:3137)

at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:125)

at io.vavr.control.Try.getOrElseThrow(Try.java:671)

at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:292)

at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$3.call(KCVSConfiguration.java:177)

at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$3.call(KCVSConfiguration.java:174)

at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)

at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)

at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)

... 26 more

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 2 replica responded)


I am trying to update janus client to not use QUORUM but LOCAL_ONE.  Setting below property but its not working.

 

JanusGraphFactory.Builder configProps = JanusGraphFactory.build();      configProps.set("storage.cql.read-consistency-level", “LOCAL_ONE”);

 

In CQLTransaction class i see its getting set by reading value from CustomOption

public CQLTransaction(final BaseTransactionConfig config) {
super(config);
this.readConsistencyLevel = ConsistencyLevel.valueOf(getConfiguration().getCustomOption(READ_CONSISTENCY));
this.writeConsistencyLevel = ConsistencyLevel.valueOf(getConfiguration().getCustomOption(WRITE_CONSISTENCY));
}


I did tied  seeing value by using customOption() method of TransactionBuilder but no luck.

 

GraphTraversalSource g = janusGraph.buildTransaction()
.customOption("storage.cql.read-consistency-level","LOCAL_ONE")
.start().traversal();

 


Could you please help me to fix it?


Thanks & Regards,
Anjani

 


Re: [janusgraph-dev] [Meetup] JanusGraph Meetup May 18 covering JG OLAP approaches

cmilowka
 

Thank you Ted, I am also interested in watching this video on OLAP later, it is like 00:30 in Melbourne, not too easy to get up in the morning...
Regards to presenters, Christopher.


Re: [janusgraph-dev] [Meetup] JanusGraph Meetup May 18 covering JG OLAP approaches

Ted Wilmes
 

Hi Boxuan,
Yes, definitely. I'll post this under presentations on janusgraph.org. Also, I hadn't posted meetup 3 on there yet and finally tracked the link down, so that will also be up there shortly.

Thanks,
Ted


On Sun, May 16, 2021 at 10:22 AM Boxuan Li <liboxuan@...> wrote:
Hi Ted,

Thanks for organizing this! Do you have plans to record & release the video after the meetup? 10:30 ET is a bit late for some regions in APAC, so it would be great if there would be a video record.

Cheers,
Boxuan

On May 14, 2021, at 11:37 PM, Ted Wilmes <twilmes@...> wrote:

Hello,
We will be hosting a community meetup next week on Tuesday, May 18th at 9:30 central/10:30 eastern. We have a great set of speakers who will be discussing all things JanusGraph OLAP:

* Hadoop Marc who has helped many of us on the mailing list and in JG issues
* Saurabh Verma, principal engineer at Zeotap
* Bruno Berriso, engineer at Expero

If you're interested in signing up, here's the link: https://www.experoinc.com/get/janusgraph-user-group.

Thanks,
Ted

721 - 740 of 6651