Date   

Re: Janusgraph query execution performance

hadoopmarc@...
 

Analytical queries require a full table scan. Some people succeed in speeding up analytical queries on JanusGraph using OLAP, check the older questions on OLAP and SparkGraphComputer and
https://docs.janusgraph.org/advanced-topics/hadoop/

A special case occurring very frequently, is counting the number of vertices for each label (you say: concept). Speeding this up is listed in the known issues:
https://github.com/JanusGraph/janusgraph/issues/926

Best wishes,   Marc


Janusgraph query execution performance

lalwani.ritu2609@...
 

Hi,

I have used https://github.com/IBM/expressive-reasoning-graph-store project to import the turtle file having around 4 Lakhs of concepts ad this project is using Janusgraph 0.4.0.
Now after importing I am able to run the queries.
But here the problem that I am facing is that some of the queries which access few number of nodes are quite faster. But some queries like counting the number of concepts in the graph (which access large number of nodes) are very very slow. Please note that I have used indexing already.

So is this due the version of Janusgraph which is 0.4.0(quite older version)?
Or the performance will be like this only for Janusgraph?

Any help will highly be appreciated.

Thanks!!


Re: OLAP Spark

hadoopmarc@...
 

Hi Vinayak,

JanusGraph has defined hadoop InputFormats for its storage backends to do OLAP queries, see https://docs.janusgraph.org/advanced-topics/hadoop/

However, these InputFormats have several problems regarding performance (see the old questions on this list), so your approach could be worthwhile:

1. It is best to create these ID's on ingestion of data in JanusGraph and add them as vertex property. If you create an index on this property, it is possible to use these id properties for retrieval during OLAP queries.
2. Spark does this automatically if you call rdd.mapPartitions on the RDD with ids.
3. Here is the disadvantage of this approach. You simply run the gremlin query per partition with ids, but you have to merge the results per partition afterwards outside gremlin. The merge logic differs per type of query.

Best wishes,     Marc


Re: Janusgraph spark on yarn error

hadoopmarc@...
 

The path of the BulkLoaderVertexProgram might be doable, but I cannot help you on that one. In the stack trace above, the yarn appmaster from spark-yarn apparently tries to communicate with HBase but finds that various libraries do not match. This failure arises because the JanusGraph distribution does not include spark-yarn and thus is not handcrafted to work with spark-yarn.

For the path without BulkLoaderVertexProgram you inevitably need a JVM language (java, scala, groovy). In this case, a spark executor is unaware of any other executors running and  is simply passed a callable (function) to execute (through RDD.mapPartitions() or through a spark-sql UDF). This callable can be part of a class that establish its own JanusGraph instances in the OLTP way. Now, you only have to deal with the executor CLASSPATH which does not need spark-yarn and the libs from the janusgraph distribution suffice.

Some example code can be found at:
https://nitinpoddar.medium.com/bulk-loading-data-into-janusgraph-part-2-ca946db26582

Best wishes,    Marc


Re: reindex job is very slow on ElasticSearch and BigTable

hadoopmarc@...
 

I mean, what happens if you try to run MapReduceIndexManagement on BigTable. Apparently, you get this error message "MapReduceIndexManagement is not supported for BigTable" but I would like to see the full stack trace leading to this error message, to see where this incompatibility stems from. E.g. the code in:

https://github.com/JanusGraph/janusgraph/blob/d954ea02035d8d54b4e1bd5863d1f903e6d57844/janusgraph-hadoop/src/main/java/org/janusgraph/hadoop/MapReduceIndexManagement.java

reads:
HadoopStoreManager storeManager = (HadoopStoreManager) graph.getBackend().getStoreManager().getHadoopManager();
if (storeManager == null) {
    throw new IllegalArgumentException("Store manager class " + graph.getBackend().getStoreManagerClass() + "is not supported");
}

But this is not what you see.

Best wishes,    Marc
 


OLAP Spark

Vinayak Bali
 

Hi All,

I am working on OLAP using Spark and Hadoop. I have a couple of questions.
1. How to execute a filter step on the driver and create an RDD of internal ids?
2. Distributing the collected Ids to multiple Spark Executor?
3. Execute Gremlin in Parallel

Thanks & Regards,
Vinayak


Re: Database Level Caching

Boxuan Li
 

Thanks Nicolas, I am able to reproduce it using your configs & script. Created an issue at https://github.com/JanusGraph/janusgraph/issues/2369

Looks like a bug with calculating cache entries' size.

On Jan 15, 2021, at 11:53 PM, Nicolas Trangosi <nicolas.trangosi@...> wrote:

Hi Boxuan,
Issue seems to occurs when edge properties are retrieved: cache has expected size with g.V().outE().id() an not when I do g.V().outE().valueMap();

I am able to reproduce with following groovy script :
  • gremlin console  ( launched with JAVA_OPTS="-Xmx1G -Xms1G" ./bin/gremlin.sh)   on JG 0.5.3
  • conf/janusgraph-cache.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=cql
storage.hostname=127.0.0.1
storage.port=9042
schema.default=logging

cache.db-cache: true
cache.db-cache-size: 50000000
cache.db-cache-time: 6000000

  •   groovy script:

graph = JanusGraphFactory.open('conf/janusgraph-cache.properties')
g = graph.traversal()


// Schema creation
graph.tx().rollback()
mgmt = g.getGraph().openManagement()

try {
    deviceLabel = mgmt.makeVertexLabel('device').make()
    nameProperty = mgmt.makePropertyKey("name").dataType(java.lang.String).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    mgmt.addProperties(deviceLabel, nameProperty)

    measurementLabel = mgmt.makeEdgeLabel('measurement').unidirected().make()
    deviceNameProperty = mgmt.makePropertyKey("deviceName").dataType(java.lang.String).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    physicalQuantityProperty = mgmt.makePropertyKey("physicalQuantity").dataType(java.lang.String).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    valueProperty = mgmt.makePropertyKey("value").dataType(java.lang.Double).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    timestampProperty = mgmt.makePropertyKey("timestamp").dataType(java.util.Date).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()

    mgmt.addProperties(measurementLabel, deviceNameProperty, physicalQuantityProperty, valueProperty, timestampProperty)
    mgmt.buildIndex("deviceByName", Vertex.class).indexOnly(deviceLabel).addKey(nameProperty).buildCompositeIndex();

    //mgmt.buildEdgeIndex(measurementLabel, 'measurementByTimestamp', Direction.OUT, Order.decr, timestampProperty);

    mgmt.commit()
} catch (Exception e) {
    mgmt.rollback();
    throw e;
}

// Load data
random = new Random();
startTs = System.currentTimeMillis();
for (i = 0; i < 100; i++) {
   deviceId = g.addV("device").property("name", "device-" + i).id().next();
   for (k = 0; k < 5000; k++) {
       g.V(deviceId).addE("measurement").
           property("deviceName",  "device-" + i).
           property("physicalQuantity", "physicalQuantity-" + random.nextInt(10)).
           property("value", random.nextDouble()).
           property("timestamp", new Date(startTs + k * 1000)).
           iterate();
       if (k % 1000 == 0) {
           g.tx().commit();
       }
   }
   log.info("Done i={}",i);
}
g.tx().commit();

// Request data 
for (i = 0; i < 100; i++) {
   measurementsList = g.V().has("device", "name", "device-" + i).outE().valueMap().toList();
   log.info("Got {} measurements for {}", measurementsList.size(), i);
}
g.tx().commit();



Le sam. 9 janv. 2021 à 05:21, BO XUAN LI <liboxuan@...> a écrit :
Hi Nicolas,

Looks interesting. Your configs look fine and I couldn’t reproduce your problem. Could you provide some sample code to reproduce it?

Best regards,
Boxuan


On Jan 4, 2021, at 10:20 PM, Nicolas Trangosi <nicolas.trangosi@...> wrote:

Hi Boxuan,

I have configured janusgraph with:

cache.db-cache-time: 600000  
cache.db-cache: true  
cache.db-cache-size: 50000000  
index.search.elasticsearch.create.ext.number_of_replicas: 0
storage.buffer-size: 1024
index.search.elasticsearch.create.ext.number_of_shards: 1
cache.cache.db-cache-time: 0
index.search.index-name: dcbrain
index.search.backend: elasticsearch
storage.port: 9042
ids.block-size: 1000000
schema.default: logging
storage.cql.batch-statement-size: 50
index.search.hostname: dfe-elasticsearch
storage.backend: cql
storage.hostname: dfe-cassandra
storage.cql.local-max-requests-per-connection: 4096
index.search.port: 9200


I have load some data on the graph and dump memory.
When I import this dump with jvisualVM, retained size for ExpirationKCVSCache 257 Mb when the limit should be 50 Mb.
<image.png>

Regards,
Nicolas

Le lun. 4 janv. 2021 à 13:11, BO XUAN LI <liboxuan@...> a écrit :
Hi Nicolas,

Can you provide your configurations and the memory usage you observed?

Regards,
Boxuan

On Jan 4, 2021, at 3:44 PM, Nicolas Trangosi <nicolas.trangosi@...> wrote:

Hi,
I try to use  Database Level Caching as described in https://docs.janusgraph.org/basics/cache/ but it seems to use more memory than the configured threshold ( cache.db-cache-size ). Does anyone use such a feature ? Is it production ready ?

Regards,
Nicolas


Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CAD7qnB4SYvXq5A3vkzu44fERkySr2kPhsoZC-5%3DbBoz9KvzPnw%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/3B5EFE52-BE38-437B-B399-05AF4899F398%40connect.hku.hk.


--
  
Nicolas Trangosi
Lead back
+33 (0)6 77 86 66 44      
   



Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CAD7qnB7bY3bNwf3PVCLuuU%2BOT%2BAmGWnoEGtT00i6LQ8%2Bu5sHzw%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/D7009A96-D9BB-4F05-A792-372EB3F5CE34%40connect.hku.hk.


--
  
Nicolas Trangosi
Lead back
+33 (0)6 77 86 66 44      
   



Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.


Re: Janusgraph spark on yarn error

j2kupper@...
 
Edited

Thank you for response!

I am using BulkLoaderVertexProgram from console. Sometimes it works correctly.
This error still exist when i am running read from hbase spark job.

my  read-hbase.properties

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=false
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=192.168.1.11,192.168.1.12,192.168.1.13,192.168.1.14
janusgraphmr.ioformat.conf.storage.hbase.table=testTable


spark.master=yarn
spark.submit.deployMode=client
spark.yarn.archive=/usr/local/janusgraph/janusgraph_libs.zip
spark.executor.instances=2
spark.driver.memory=8g
spark.driver.cores=4
spark.executor.cores=5
spark.executor.memory=19g

spark.executor.extraClassPath=/usr/local/janusgraph/lib:/usr/local/hadoop/etc/hadoop/conf
spark.executor.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native
spark.yarn.am.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native
spark.yarn.appMasterEnv.CLASSPATH=/usr/local/janusgraph/lib:/usr/local/hadoop/etc/hadoop/conf


spark.driver.extraLibraryPath=/usr/local/hadoop/lib/native
spark.executor.extraLibraryPath=/usr/local/hadoop/lib/native

spark.dynamicAllocation.enabled=false
spark.io.compression.codec=snappy
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Can you provide some code example spark of application loading data OLTP way?
Which program langugage can i use? (I want python, if it`s possible)


Re: reindex job is very slow on ElasticSearch and BigTable

vamsi.lingala@...
 

Thanks a lot for your reply.
I don't get any error message.

The REINDEX step is very slow (2000/s) for the mixed index and fails after running for a few days.
Everytime indexing fails or the cluster is restarted the shards settings and replica declared for the Indices is reset to 1 (default values for es-7) and thus I have to recreate new indices after disabling old ones.

is there any better way to reindex fast?


Re: Janusgraph spark on yarn error

hadoopmarc@...
 

#Private reply from OP:
Yes, i am running bulk load from hdfs(graphson) in janusgraph-hbase.
Yes, i have graphson part files from spark job with a structure like grateful-dead.json example.

But if application master starting on certain(third) hadoop node is working well.
All nodes have identical configuration.

#Answer HadoopMarc
You do not need to use HadoopGraph for this. Indeed, there used to be a BulkLoaderVertexProgram in Apache TinkerPop, but this could not be maintained and keep working reliably for the various versions of the various graph systems. Until now, JanusGraph does not have developed its own BulkLoaderVertexProgram. Also note that while their does exist an HBaseInputFormat for loading a janusgraph-hbase graph into a HadoopGraph, there does not exist an HBaseOutputFormat to write an HadoopGraph into janusgraph-hbase.

This being said, nothing is lost. You can simply write a spark application that has individual spark executors connect to janusgraph in the usual (OLTP) way and load data with the usual graph.traversal() API, that is using the addV(), addE() and properties() traversal steps. Of course, you could also try and copy the old code for the BulkLoaderVertexProgram into your project, but I believe the way I sketched is conceptually simpler and less error prone.

I tend to remember that their exist some blog series about using JanusGraph at scale, but I do not have then at hand and will look for them later on. If you find these blogs yourself, pleas post the links!

Best wishes,      Marc


Re: reindex job is very slow on ElasticSearch and BigTable

hadoopmarc@...
 

Thanks for reposting your issue on the janusgraph-users list!

Can you please show the entire stack trace leading to your error message?

Note that your issue might be related to:
https://github.com/JanusGraph/janusgraph/issues/2201

Marc


reindex job is very slow on ElasticSearch and BigTable

vamsi.lingala@...
 

we have imported around 4 billion vertices in janus graph.
We are using big table and elastic search

reindexing speed is very slow..around 2000 records per second
is there any way to speed it up?

MapReduceIndexManagement is not supported for BigTable


Re: Janusgraph spark on yarn error

hadoopmarc@...
 

Hi

OK, do I understand right that you want to bulk load data from hdfs into janusgraph-hbase? Nothing wrong with that requirement, I do not know how to ask this in a more friendly way!

Is your input data really in GraphSON format? (it is difficult to get this right!)

With that established, we can see further, because this is a broad subject.

Marc


Janusgraph spark on yarn error

j2kupper@...
 

Hi!

I have this configuration

janusgraph: 0.5.2
spark 2.4.0
hbase 2.1.5
hadoop 2.7.7

I have 3 nodes hadoop on my cluster.
I am set up janusgraph with hadoop infrastructure and run load data and read data on spark. But i have error

21/01/18 17:27:25 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/01/18 17:27:26 INFO yarn.ApplicationMaster: Preparing Local resources
Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google.protobuf.Message
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:226)
    at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:776)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2117)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$8$$anonfun$apply$3.apply(ApplicationMaster.scala:220)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$8$$anonfun$apply$3.apply(ApplicationMaster.scala:217)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$8.apply(ApplicationMaster.scala:217)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$8.apply(ApplicationMaster.scala:182)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
    at org.apache.spark.deploy.yarn.ApplicationMaster.<init>(ApplicationMaster.scala:182)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:796)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:827)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)

This is my hadoop-load.properties

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./files.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.spark.persistContext=false

spark.master=yarn
spark.yarn.archive=hdfs:///user/root/janusgraph_libs.zip

spark.yarn.maxAppAttempts=5
spark.executor.instances=2
spark.shuffle.service.enabled=false
spark.driver.memory=4g
spark.driver.cores=4
spark.executor.cores=5
spark.executor.memory=19g
spark.executor.extraClassPath=/usr/local/janusgraph/lib/*:/usr/local/hadoop/etc/hadoop/conf:/usr/local/spark/conf:/usr/local/hbase/conf
spark.executor.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native
spark.yarn.am.extraJavaOptions=-Djava.library.path=/usr/local/hadoop/lib/native
spark.dynamicAllocation.enabled=false
spark.io.compression.codec=snappy
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator
spark.hadoop.home.dir=/usr/local/hadoop
spark.hadoop.cloneConf=true


How can i fix it?
Thank you


Re: Database Level Caching

Nicolas Trangosi
 

Hi Boxuan,
Issue seems to occurs when edge properties are retrieved: cache has expected size with g.V().outE().id() an not when I do g.V().outE().valueMap();

I am able to reproduce with following groovy script :
  • gremlin console  ( launched with JAVA_OPTS="-Xmx1G -Xms1G" ./bin/gremlin.sh)   on JG 0.5.3
  • conf/janusgraph-cache.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=cql
storage.hostname=127.0.0.1
storage.port=9042
schema.default=logging

cache.db-cache: true
cache.db-cache-size: 50000000
cache.db-cache-time: 6000000

  •   groovy script:

graph = JanusGraphFactory.open('conf/janusgraph-cache.properties')
g = graph.traversal()


// Schema creation
graph.tx().rollback()
mgmt = g.getGraph().openManagement()

try {
    deviceLabel = mgmt.makeVertexLabel('device').make()
    nameProperty = mgmt.makePropertyKey("name").dataType(java.lang.String).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    mgmt.addProperties(deviceLabel, nameProperty)

    measurementLabel = mgmt.makeEdgeLabel('measurement').unidirected().make()
    deviceNameProperty = mgmt.makePropertyKey("deviceName").dataType(java.lang.String).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    physicalQuantityProperty = mgmt.makePropertyKey("physicalQuantity").dataType(java.lang.String).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    valueProperty = mgmt.makePropertyKey("value").dataType(java.lang.Double).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()
    timestampProperty = mgmt.makePropertyKey("timestamp").dataType(java.util.Date).cardinality(org.janusgraph.core.Cardinality.SINGLE).make()

    mgmt.addProperties(measurementLabel, deviceNameProperty, physicalQuantityProperty, valueProperty, timestampProperty)
    mgmt.buildIndex("deviceByName", Vertex.class).indexOnly(deviceLabel).addKey(nameProperty).buildCompositeIndex();

    //mgmt.buildEdgeIndex(measurementLabel, 'measurementByTimestamp', Direction.OUT, Order.decr, timestampProperty);

    mgmt.commit()
} catch (Exception e) {
    mgmt.rollback();
    throw e;
}

// Load data
random = new Random();
startTs = System.currentTimeMillis();
for (i = 0; i < 100; i++) {
   deviceId = g.addV("device").property("name", "device-" + i).id().next();
   for (k = 0; k < 5000; k++) {
       g.V(deviceId).addE("measurement").
           property("deviceName",  "device-" + i).
           property("physicalQuantity", "physicalQuantity-" + random.nextInt(10)).
           property("value", random.nextDouble()).
           property("timestamp", new Date(startTs + k * 1000)).
           iterate();
       if (k % 1000 == 0) {
           g.tx().commit();
       }
   }
   log.info("Done i={}",i);
}
g.tx().commit();

// Request data 
for (i = 0; i < 100; i++) {
   measurementsList = g.V().has("device", "name", "device-" + i).outE().valueMap().toList();
   log.info("Got {} measurements for {}", measurementsList.size(), i);
}
g.tx().commit();



Le sam. 9 janv. 2021 à 05:21, BO XUAN LI <liboxuan@...> a écrit :
Hi Nicolas,

Looks interesting. Your configs look fine and I couldn’t reproduce your problem. Could you provide some sample code to reproduce it?

Best regards,
Boxuan


On Jan 4, 2021, at 10:20 PM, Nicolas Trangosi <nicolas.trangosi@...> wrote:

Hi Boxuan,

I have configured janusgraph with:

cache.db-cache-time: 600000  
cache.db-cache: true  
cache.db-cache-size: 50000000  
index.search.elasticsearch.create.ext.number_of_replicas: 0
storage.buffer-size: 1024
index.search.elasticsearch.create.ext.number_of_shards: 1
cache.cache.db-cache-time: 0
index.search.index-name: dcbrain
index.search.backend: elasticsearch
storage.port: 9042
ids.block-size: 1000000
schema.default: logging
storage.cql.batch-statement-size: 50
index.search.hostname: dfe-elasticsearch
storage.backend: cql
storage.hostname: dfe-cassandra
storage.cql.local-max-requests-per-connection: 4096
index.search.port: 9200


I have load some data on the graph and dump memory.
When I import this dump with jvisualVM, retained size for ExpirationKCVSCache 257 Mb when the limit should be 50 Mb.
<image.png>

Regards,
Nicolas

Le lun. 4 janv. 2021 à 13:11, BO XUAN LI <liboxuan@...> a écrit :
Hi Nicolas,

Can you provide your configurations and the memory usage you observed?

Regards,
Boxuan

On Jan 4, 2021, at 3:44 PM, Nicolas Trangosi <nicolas.trangosi@...> wrote:

Hi,
I try to use  Database Level Caching as described in https://docs.janusgraph.org/basics/cache/ but it seems to use more memory than the configured threshold ( cache.db-cache-size ). Does anyone use such a feature ? Is it production ready ?

Regards,
Nicolas


Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CAD7qnB4SYvXq5A3vkzu44fERkySr2kPhsoZC-5%3DbBoz9KvzPnw%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/3B5EFE52-BE38-437B-B399-05AF4899F398%40connect.hku.hk.


--
  
Nicolas Trangosi
Lead back
+33 (0)6 77 86 66 44      
   



Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CAD7qnB7bY3bNwf3PVCLuuU%2BOT%2BAmGWnoEGtT00i6LQ8%2Bu5sHzw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/D7009A96-D9BB-4F05-A792-372EB3F5CE34%40connect.hku.hk.


--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   



Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.


Re: Hadoop and Spark not working with Janusgraph

hadoopmarc@...
 

Hi Vinayak,

JanusGraph itself has spark as a dependency and the spark versions of the spark jars in the janusgraph/lib folder and of your spark cluster must match.

Best wishes,   Marc


Hadoop and Spark not working with Janusgraph

Vinayak Bali <vinayakbali16@...>
 



---------- Forwarded message ---------
From: Vinayak Bali <vinayakbali16@...>
Date: Wed, 13 Jan 2021, 7:42 pm
Subject: Hadoop and Spark not working with Janusgraph
To: <janusgraph-users@...>


Hi,
Installed and configured Apache Hadoop(3.3.0) and Apache Spark(3.0.1) with Janusgraph(0.4.0). When I try to execute a query it is not working. Earlier it was working with native spark but took a long time(in minutes) to return a single record. How can the time be optimized and also make it run?

gremlin> hdfs
==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-568319645_1, ugi=fusionops (auth:SIMPLE)]]]
gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().limit(1).valueMap()
13:52:00 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
13:52:00 WARN  org.apache.spark.SparkContext  - Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60)
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
13:53:00 ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application has been killed. Reason: All masters are unresponsive! Giving up.
13:53:00 WARN  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application ID is not initialized yet.
13:53:00 WARN  org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint  - Drop UnregisterApplication(null) because has not yet connected to master
13:53:00 ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.
java.lang.NullPointerException

read-cql-standalone-cluster.properties

# Copyright 2020 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true

#
# JanusGraph Cassandra InputFormat configuration
#
# These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cql
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042
# This specifies the keyspace where data is stored.
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
# This defines the indexing backend configuration used while writing data to JanusGraph.
janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
janusgraphmr.ioformat.conf.index.search.hostname=127.0.0.1
# Use the appropriate properties for the backend when using a different storage backend (HBase) or indexing backend (Solr).

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true

#
# SparkGraphComputer Configuration
#
spark.master=spark://127.0.0.1:7077
spark.executor.memory=1g
spark.executor.extraClassPath=/opt/lib/janusgraph/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Thanks & Regards,
Vinayak


[ANNOUNCEMENT] JanusGraph joins LF AI & Data umbrella

"alex...@gmail.com" <alexand...@...>
 

On behalf of the JanusGraph Technical Steering Committee, we are excited to join LF AI & Data Foundation!


We are now moving mailing lists to a new home under LF AI & Data. Mailing lists hosted on Google Groups will now be considered obsolete and will be available in read-only mode. All messages from Google Groups mailing lists will be moved to new mailing lists in the next few days.


We kindly ask everyone to switch to new mailing lists:

  • JanusGraph Users: for questions about using JanusGraph, installation, configuration, and integrations. First-time posts are moderated and may not be visible immediately.

  • JanusGraph Dev: for discussion on internal implementation details of JanusGraph itself. Questions about using JanusGraph, installation, configuration, and integrations should be posted on janusgraph-users. First-time posts are moderated and may not be visible immediately.

  • JanusGraph Announce: for new releases and news announcements.


In addition to the mailing lists, we are continuing to use the following collaboration tools, with no changes:


Best regards,

Oleksandr Porunov

on behalf of JanusGraph TSC


Re: Could not alias [g] to [g] as [g]

HadoopMarc <bi...@...>
 

Hi,

Unfortunately, the janusgraph ref docs do not have full documentation for using scylladb, but in order to use scylladb you have to make sure the janusgraph configs are similar to the janusgraph-cql-es-server.properties from the janusgraph distribution. When comparing janusgraph-cassandra-es-server.properties and janusgraph-cql-es-server.properties, you will see that in many places "cassandra" is replaces with "cql".

See also the scylla docs, which state that compatibility holds when using the cql protocol:

https://docs.scylladb.com/using-scylla/integrations/integration-janus/

Best wishes,     Marc
Op maandag 11 januari 2021 om 17:53:15 UTC+1 schreef ya...@...:

When I replaced Cassandra with ScyllaDB it stopped working. Could it be that ScyllaDB is not compatible with either JanusGraph or Elasticsearch?

On Monday, January 11, 2021 at 5:34:06 PM UTC+1 Yamiteru XYZ wrote:
So it looks like the docker-compose-cql-es.yml works without a problem.

On Sunday, January 10, 2021 at 4:30:54 PM UTC+1 HadoopMarc wrote:
Hi,

Can you also try with the tested docker-compose-cql-es.yml from:


It has subtle differences with the yml file you listed. If you have a working situation to compare with, it easier to find what is wrong. If this does not work, it is easy for others to reproduce the issue.

Best wishes,    Marc
Op zondag 10 januari 2021 om 13:17:50 UTC+1 schreef ya...@...:
Hi,

- yes I can estabilish ws connection to my server with a general ws test client.

- the g in javascript prints this value:
GraphTraversalSource {
  graph: Graph {},
  traversalStrategies: TraversalStrategies { strategies: [ [RemoteStrategy] ] },
  bytecode: Bytecode { sourceInstructions: [], stepInstructions: [] },
  graphTraversalSourceClass: [class GraphTraversalSource],
  graphTraversalClass: [class GraphTraversal extends Traversal]
}

- when I try to print count like this I get no value at all:
g.V().count().toList()
  .then(v => console.log("-- COUNT --", v))
  .catch(e => console.log("-- ERROR --", e));

When I use "ws://46.36.36.121:8182/gremlin" instead of "ws://localhost:8182/gremlin" from my local machine I get the same "Could not alias [g] to [g] as [g] not in the Graph or TraversalSource global bindings" error.

Miroslav.

On Thursday, January 7, 2021 at 5:08:12 PM UTC+1 HadoopMarc wrote:
OK, I see that you run gremlin-npm locally on your server. Are you sure that localhost is exposed in the gremlin server? What happens if you connect from your local machine towards the ip address of the sever?

Marc

Op donderdag 7 januari 2021 om 17:02:44 UTC+1 schreef HadoopMarc:
Hi

I am lost in the stream of messages. I gather that:
  • you can establish a ws connection to your server with a general ws test client
  • you can run a "const g = traversal().withRemote(remote);" in javascript without error (has g indeed a value unequal to null???)
  • you try to export the GraphTraversalSource g and use it in some other module: how do you it does not work?
Can you try and print some debug statement using g in the module where it is defined. Look here for inspiration for a query using g:

Best wishes,     Marc


Op donderdag 7 januari 2021 om 15:50:02 UTC+1 schreef ya...@...:
Can you help me? It's really frustrating and I cannot find a solution. I can even give you ssh access so you can try to find a solution. Anything. I'm desperate.

On Wednesday, January 6, 2021 at 4:25:10 PM UTC+1 Yamiteru XYZ wrote:
It wasn't as stated in later posts..

On Wednesday, January 6, 2021 at 3:27:08 PM UTC+1 HadoopMarc wrote:
Thnaks for posting back that the issue was solved!

Op woensdag 6 januari 2021 om 12:02:38 UTC+1 schreef ya...@...:
I use this code on my server:

import { process, driver } from "gremlin";


const url = 'ws://localhost:8182/gremlin';
const traversal = process.AnonymousTraversalSource.traversal;
const DriverRemoteConnection = driver.DriverRemoteConnection;
const DriverClientConnection = driver.Client;
const remote = new DriverRemoteConnection(url);
const client = new DriverClientConnection(url);


export const g = traversal().withRemote(remote);
export const c = client;
export const T = process.t;
export const __ = process.statics;

export default g;

On Tuesday, January 5, 2021 at 2:33:55 PM UTC+1 HadoopMarc wrote:
OK, I understand now what you mean with "I see this error only on my server ". What happens if you try to open a websocket connection with a test webscocket client, e.g. for firefox:


It might be that the server does not have an open 8182 port. For less wilder guesses, I really need more information (client stacktrace, server logs, etc).

Best wishes,     Marc
Op dinsdag 5 januari 2021 om 12:38:27 UTC+1 schreef ya...@...:
I'm using the docker version and I haven't changed anything. Where do I find the yaml config? I've tested it on 2 PC's (ubuntu and windows) and it works without a problem. It doesn't work only on a server (ubuntu).

On Monday, January 4, 2021 at 5:37:37 PM UTC+1 HadoopMarc wrote:
Hi,

Can you show the gremlin server yaml config file (part of janusgraph) as well as the groovy script file it refers to? The groovy script should define the GraphTraversalSource g that you want to bind to.

Best wishes,     Marc

Op zondag 3 januari 2021 om 23:56:20 UTC+1 schreef ya...@...:
Hi, I have JanusGraph with Scylla and Elasticsearch in a docker. I'm connecting to JanusGraph on backend using gremlin.

I see this error only on my server and not on my local machine. What does it mean and more importantly how do I fix it? I'm a noob when it comes to backend and database stuff so please be kind. Thank you.



Re: docker base tests for scylladb stopped working

Israel Fruchter <fr...@...>
 

Raised it in 
https://github.com/JanusGraph/janusgraph/issues/2351

On Wednesday, January 6, 2021 at 4:56:56 PM UTC+2 Israel Fruchter wrote:
Hi Jan

can you elaborate on the method ? as far I've seen the new check is checking the name of the repo, and would expect is to start with string `cassandra`
otherwise it would fail and not run.

in our case the project name is `scylladb` hence it doesn't match

can we revert the new version of testcontainers until we can figure a work around this ?

On Tuesday, January 5, 2021 at 6:06:29 PM UTC+2 fa...@... wrote:
Hi
If you like you could open an issue for it. 

We already have a flag to run test against different version of Cassandra you could add Scylla to this test, if you like. 

Greetings,
Jan

On 5. Jan 2021, at 15:58, Israel Fruchter <f...@...> wrote:



close look seem to there a new PR

that breaks taking scylla docker as a drop in replacement in the tests...
since it's name doesn't start with "cassandra"... 

running the test suite with the cassandra docker image does work...

Any ideas on how it can be solved ?
On Tuesday, January 5, 2021 at 4:55:27 PM UTC+2 Israel Fruchter wrote:
seem like this one the breaks the code:

commit f52250f092cfe310916a6a9e743a4ba2a4b62607 (HEAD)
Author: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Date:   Mon Dec 28 11:14:35 2020 +0000

    Bump testcontainers.version from 1.14.3 to 1.15.1
    
    Bumps `testcontainers.version` from 1.14.3 to 1.15.1.
    
    Updates `testcontainers` from 1.14.3 to 1.15.1
    
    Updates `elasticsearch` from 1.14.3 to 1.15.1
    
    Updates `cassandra` from 1.14.3 to 1.15.1
    
    Updates `junit-jupiter` from 1.14.3 to 1.15.1
    
    Signed-off-by: dependabot-preview[bot] <s...@...>



On Tuesday, January 5, 2021 at 12:16:14 PM UTC+2 Israel Fruchter wrote:
Recently the docker base tests for scylladb stopped working
the last confirmed one that was working (in our CI) was c97e84ef401d5a17c4c0b37c1af5fdad06db06fd

how can I figure out what is the issue there ?

$ java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

$ mvn clean install -pl janusgraph-cql -Pscylladb

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.085 s <<< FAILURE! - in org.janusgraph.diskstorage.cql.CQLDistributedStoreManagerTest                                              
[ERROR] org.janusgraph.diskstorage.cql.CQLDistributedStoreManagerTest  Time elapsed: 0.085 s  <<< ERROR!                                                                                                     
java.lang.ExceptionInInitializerError                                                                                                                                                                        
        at org.janusgraph.diskstorage.cql.CQLDistributedStoreManagerTest.<clinit>(CQLDistributedStoreManagerTest.java:37)

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/0141a784-07e2-4e90-af3d-1f4b42995bf3n%40googlegroups.com.

1121 - 1140 of 6651