Date   

Performance issue on janusgraph with elastic searxh

naresh...@...
 

Hi,
Am facing performance with 2 reasons.
1)Vertex with 40 properties.
2) mixed index on just 2 properties

Initially I have only 10 properties and with out mixed index janusgraph performance was good. Now properties got increasd to 40 and mixed index on just 2 properties(because of date range queries) now performance is very bad. Is there any way to increase performance here?

Once created schema with mixed index, even not inserting data to those properties (mixed index properties) performance is very bad.. If create schema with out mixed indexes it will be bit better.

Any solutions please..

Thanks,
Naresh


[Blog] Transferring a subgraph from Janusgraph to Neo4j

HadoopMarc <bi...@...>
 


The following short blog might be of interest to JanusGraph users who provide datasets to data science teams:


Cheers,   Marc


Re: Remote JanusGraph server

Antriksh Shah <sha...@...>
 

Could you try executing g.V()....valueMap(). If you are getting the vertices, then to obtain properties you can get so using valueMap()


JanusGraph Tips and Tricks Articles

Chris Hupman <chris...@...>
 

Hi All,

I compiled the JanusGraph content I wasn't able to incorporate into my previous articles and put them into two new articles. You can find part 1 and part 2 on medium, or on developer.ibm.com, with part 1 here and part 2 here.

The articles cover index troubleshooting, traversal binding tips, schema creation, as well as GraphSON and GraphML imports and exports. 

Hopefully the content is useful to the community.

Cheers,

Chris Hupman


Issue while adding Vertices remotely using gremlin traversal

mandarba...@...
 

I am using gremlin python client and loaded a graph into Configured Graph Factory which was created using template configuration

If I do this, The vertex gets added
gremlin_client.submit("g1=ConfiguredGraphFactory.open('airroutes');g1.addVertex();g1.vertices().size()").next()

But this does not work
gremlin_client.submit("g1=ConfiguredGraphFactory.open('airroutes');g3=g1.traversal();g3.addV();g3.V().count()").next()

What could be the reason

Also is there any way i can connect global 'g' to this graph as mentioned in this blog (https://medium.com/@BGuigal/janusgraph-python-9e8d6988c36c) ?
The global g doesn't refer to my graphs.

I also tried with singleton ConfiguredGraphFactory which I had in my gremlin.yaml but still the global g didn't work with it too?

Thank you
Warm Regards
Mandar



Embedded CassandraDaemon failed to initialized

Shelendra Singh <shelend...@...>
 

Hi All,
We are using Janusgraph with embedded cassandra. While initialization of Janusgraph, sometimes times initialization of cassandra get failed with below error and stack trace.

---------------------------------------------------------------------------------
[2019-06-12T12:55:23.249+0530] [CompactionExecutor:2] ERROR org.apache.cassandra.io.sstable.SSTableWriter - Failed deleting temp components for C:\server\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-tmp-ka-5
org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: C:\server\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-tmp-ka-5-Data.db: The process cannot access the file because it is being used by another process.

at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:134)
at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:108)
at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:365)
at org.apache.cassandra.io.sstable.SSTableRewriter.abort(SSTableRewriter.java:236)
at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:220)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73)
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.FileSystemException: C:\server\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-tmp-ka-5-Data.db: The process cannot access the file because it is being used by another process.

at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:130)
... 14 more
---------------------------------------------------------------------------------

This issue is inconsistent but frequent.
Appreciate any help to get workaround/solution this issue.

Thanks,
Shelendra Singh  



Remote JanusGraph server

rabih.h...@...
 

I am connecting to janusgraph via gremlin server using remotlly mode 
and  calling a gremlin query that return path 

ex : g.V().until()....path()

the returned path objects include vertices but without properties field , 
how can i get the properties of every vertex  in remote connection mode ,
knowing that in embedded mode the properties are included in the response.


Can We index a unique property for 2 or more labels ?

ali.ab...@...
 

I need to make a property that is unique across multiple labels. 
It is obvious that we can create unique index for a property over the whole graph, and I know we can create a unique index for a property on a specific label using this code .
graph.tx().rollback()  //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
username = mgmt.getPropertyKey('username')
person = mgmt.getVertexLabel('person')
user = mgmt.getVertexLabel('user')
mgmt.buildIndex('byUsernameAndLabel', Vertex.class).addKey(username).indexOnly(person).buildCompositeIndex()
mgmt.commit()

What I am asking for is to create a unique property for 2 or more labels for example :
mgmt.buildIndex('byUsernameAndManyLabels', Vertex.class).addKey(username).indexOnly(person,user).buildCompositeIndex()
 
I know the marked code above is not applicable as indexOnly takes one param `JanusGraphSchemaType`, but to clarify my question.

Any Ideas?


Re: Why is Janusgraph's write performance above 1 sec per transaction?

v.sure...@...
 

In our internal benchmark testing we attempted these two methods for creating Vertices and their edges:

1.) Using GraphTraversalSource (g) add Vertex on the fly by connecting to remote GremlinServer of JanusGraph instance.
2.) Add Vertex and Edge using JanusGraphTransaction of JanusGraphFactory graph.

#1 really didn't work for us due to the reasons posted in this thread.

However #2 gave us fairly good results as we were able to add ~46 million Vertices using 8 threads in 50 minutes i.e 15333 vertices per second.

Here are the steps that elaborate #2 (it's written in Java, posting them just in case if it helps anyone).

1.) Open a graph using jgfGraph=JanusGraphFactory.open(storageConfigFile).
2.) Open a management API of the graph using management = jgfGraph.openManagement() and build a composite index on one of the Vertex property.
3.) Get a transaction instance JanusGraphTransaction tx = jgfGraph.newTransaction().
4.) Start adding the vertices and persist the Vertex IDs (to readily utilize them while creating the edges).

The only disadvantage with #2 is, you are essentially running through the APIs locally and of course part of the processing is also shared by storage (in our case it's hbase). We ran into memory exceptions while adding 1.4 billion edges in batches of 146 million each as the server launching the Java client exhausted its resources (CPU cycles and Memory). So we are re-attempting with reduced batch size (need to figure out though what would be an ideal batch size).

On Friday, June 7, 2019 at 6:48:14 PM UTC+5:30, Simon wrote:
Hi Florian,

thank you for the response, it indeed helped for troubleshooting - the timings for one operation were in the order of 1-10ms..
Now I assume that the major drawback is inefficient queries of properties (string comparison). I will now implement indexing.


On Thursday, 6 June 2019 16:35:35 UTC+2, Florian Hockmann wrote:
Did you read other similar posts? Often the reasons are very simple for such a slow performance like no indices being used or because the queries need to be compiled first and you just measure a single request.

Here is for example a post that sounds similar to me:

It would be good to know in general whether you only measure a single request or perform like 1000 requests and compare their timings.

Am Donnerstag, 6. Juni 2019 16:04:39 UTC+2 schrieb Simon:
Dear Janusgraph-experts,

I have set up Janusgraph together with Cassandra + ES in the local server mode as described in the JanusGraph documentation on linux red hat and interacting it with gremlin-python 3.3.3.

The configs used are (including start_rpc:true and cql as backend):
conf/gremlin-server/gremlin-server.yaml
conf/gremlin-server/janusgraph-cql-es-server.properties
Invoke janusgraph via:
Bash: cassandra/bin/cassandra -f
Bash: janusgraph/bin/janusgraph.sh start

Interacting with the graph works well, but the slow performance struck me - one simple operation takes more than 1 second!

The operations look like:
g.addV('ImageID').property('position',pos).next()
g.V().hasLabel('ImageID').has('position',pos).property('year',year).next()

Any clue how I can identify the reason for this slow operation?

Best,
Simon

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.


Re: Question about using groovy closure in OLAP mode

Abhay Pandit <abha...@...>
 

Hi,
I suspect its some serialize issue. Try adding these 2 properties.

spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

Starting Facebook Discussion Group, Join Here:
https://www.facebook.com/groups/Janusgraph

Thanks,
Abhay

On Thu, 6 Jun 2019 at 19:34, FEI Hao <blanc...@...> wrote:
Hello all:

I have used the OLAP mode for some time and following queries do work:

hgraph = GraphFactory.open('/hadoop.properties')
hg = hgraph.traversal().withComputer(SparkGraphComputer)
hg.V().count()

However, if a simple groovy closure is added in the middle, a serialization error happends: 

def identity(x) { return x}
hg.V().map(identity).count()


java.lang.IllegalStateException: org.apache.spark.SparkException: Task not serializable:
...
object not serializable (class: groovysh_evaluate, value: groovysh_evaluate@6ddc67d0)

the hgraph.configuration():
==>[spark.executor.memory,6g]
==>[spark.master,local[10]]
==>[gremlin.graph,org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph]
==>[janusgraphmr.ioformat.conf.storage.hostname,x.x.x.x]
==>[spark.executor.extraClassPath,/data/ceph/.janusgraph-0.3.1/lib/*]
==>[gremlin.hadoop.graphWriter,org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat]
==>[janusgraphmr.ioformat.conf.storage.cassandra.keyspace,db20190530_1450_20190531_183034]
==>[gremlin.hadoop.jarsInDistributedCache,false]
==>[gremlin.spark.graphStorageLevel,MEMORY_AND_DISK]
==>[cassandra.input.partitioner.class,org.apache.cassandra.dht.Murmur3Partitioner]
==>[gremlin.hadoop.scriptOutputFormat.script,/data/ceph/cb6621da-8809-11e9-9917-6c92bf5eb70e.groovy]
==>[gremlin.spark.persistStorageLevel,DISK_ONLY]
==>[gremlin.hadoop.outputLocation,file:///data/ceph/cb6621da-8809-11e9-9917-6c92bf5eb70e]
==>[gremlin.spark.persistContext,true]
==>[janusgraphmr.ioformat.conf.storage.port,9160]
==>[janusgraphmr.ioformat.conf.storage.backend,cassandra]
==>[gremlin.hadoop.graphReader,org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat]
==>[gremlin.hadoop.inputLocation,none]

Since I'm using local[10] master, there should? exist no jar/java version issues.
I think the root cause is global variable groovysh evaluate, which is introduced by owner/delegate of my closure. I am not a groovy expert, my question is how to make my groovy closure serializable in this setup. 

Thank you for any help.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/3c876f4b-c3d6-4822-a6d2-222efa6102b4%40googlegroups.com.


Re: Why is Janusgraph's write performance above 1 sec per transaction?

Simon <kettere...@...>
 

Hi Florian,

thank you for the response, it indeed helped for troubleshooting - the timings for one operation were in the order of 1-10ms..
Now I assume that the major drawback is inefficient queries of properties (string comparison). I will now implement indexing.


On Thursday, 6 June 2019 16:35:35 UTC+2, Florian Hockmann wrote:
Did you read other similar posts? Often the reasons are very simple for such a slow performance like no indices being used or because the queries need to be compiled first and you just measure a single request.

Here is for example a post that sounds similar to me:

It would be good to know in general whether you only measure a single request or perform like 1000 requests and compare their timings.

Am Donnerstag, 6. Juni 2019 16:04:39 UTC+2 schrieb Simon:
Dear Janusgraph-experts,

I have set up Janusgraph together with Cassandra + ES in the local server mode as described in the JanusGraph documentation on linux red hat and interacting it with gremlin-python 3.3.3.

The configs used are (including start_rpc:true and cql as backend):
conf/gremlin-server/gremlin-server.yaml
conf/gremlin-server/janusgraph-cql-es-server.properties
Invoke janusgraph via:
Bash: cassandra/bin/cassandra -f
Bash: janusgraph/bin/janusgraph.sh start

Interacting with the graph works well, but the slow performance struck me - one simple operation takes more than 1 second!

The operations look like:
g.addV('ImageID').property('position',pos).next()
g.V().hasLabel('ImageID').has('position',pos).property('year',year).next()

Any clue how I can identify the reason for this slow operation?

Best,
Simon

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.


Re: Not sure if vertex centric index is being used

hardy arora <hardi...@...>
 

I have the same question.


On Wednesday, 10 October 2018 05:53:36 UTC-4, m...@... wrote:
It does say  isFitted=true in the pfofile output but it doesn't mention the relation Index name like it does in case of composite indexes.
Is it that for vertex centric indexes index name is not displayed in profile output? Is this a bug?

gremlin> g.V().has('project', 'projectId', 138).outE('hasHouseUnit').has('hasHouseUnitBlock', 'C').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(project), projectI...                     1           1           0.641    37.98
   
\_condition=(~label = project AND projectId = 138)
   
\_isFitted=false
   
\_query=multiKSQ[1]@2147483647
   
\_index=byProjectIdComposite
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0.021
  optimization                                                                                
0.212
JanusGraphVertexStep([hasHouseUnitBlock.eq(C)])                      679         679           1.048    62.02
   
\_condition=(hasHouseUnitBlock = C AND type[hasHouseUnit])
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@85318457
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0.074
                                           
>TOTAL                     -           -           1.690        -




Re: Why is Janusgraph's write performance above 1 sec per transaction?

Florian Hockmann <f...@...>
 

Did you read other similar posts? Often the reasons are very simple for such a slow performance like no indices being used or because the queries need to be compiled first and you just measure a single request.

Here is for example a post that sounds similar to me:
https://groups.google.com/forum/#!topic/janusgraph-users/7rSo0lLaDTk

It would be good to know in general whether you only measure a single request or perform like 1000 requests and compare their timings.

Am Donnerstag, 6. Juni 2019 16:04:39 UTC+2 schrieb Simon:

Dear Janusgraph-experts,

I have set up Janusgraph together with Cassandra + ES in the local server mode as described in the JanusGraph documentation on linux red hat and interacting it with gremlin-python 3.3.3.

The configs used are (including start_rpc:true and cql as backend):
conf/gremlin-server/gremlin-server.yaml
conf/gremlin-server/janusgraph-cql-es-server.properties
Invoke janusgraph via:
Bash: cassandra/bin/cassandra -f
Bash: janusgraph/bin/janusgraph.sh start

Interacting with the graph works well, but the slow performance struck me - one simple operation takes more than 1 second!

The operations look like:
g.addV('ImageID').property('position',pos).next()
g.V().hasLabel('ImageID').has('position',pos).property('year',year).next()

Any clue how I can identify the reason for this slow operation?

Best,
Simon

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.


Why is Janusgraph's write performance above 1 sec per transaction?

Simon <simon....@...>
 

Dear Janusgraph-experts,

I have set up Janusgraph together with Cassandra + ES in the local server mode as described in the JanusGraph documentation on linux red hat and interacting it with gremlin-python 3.3.3.

The configs used are (including start_rpc:true and cql as backend):
conf/gremlin-server/gremlin-server.yaml
conf/gremlin-server/janusgraph-cql-es-server.properties
Invoke janusgraph via:
Bash: cassandra/bin/cassandra -f
Bash: janusgraph/bin/janusgraph.sh start

Interacting with the graph works well, but the slow performance struck me - one simple operation takes more than 1 second!

The operations look like:
g.addV('ImageID').property('position',pos).next()
g.V().hasLabel('ImageID').has('position',pos).property('year',year).next()

Any clue how I can identify the reason for this slow operation?

Best,
Simon

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.


Question about using groovy closure in OLAP mode

FEI Hao <blanc...@...>
 

Hello all:

I have used the OLAP mode for some time and following queries do work:

hgraph = GraphFactory.open('/hadoop.properties')
hg = hgraph.traversal().withComputer(SparkGraphComputer)
hg.V().count()

However, if a simple groovy closure is added in the middle, a serialization error happends: 

def identity(x) { return x}
hg.V().map(identity).count()


java.lang.IllegalStateException: org.apache.spark.SparkException: Task not serializable:
...
object not serializable (class: groovysh_evaluate, value: groovysh_evaluate@6ddc67d0)

the hgraph.configuration():
==>[spark.executor.memory,6g]
==>[spark.master,local[10]]
==>[gremlin.graph,org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph]
==>[janusgraphmr.ioformat.conf.storage.hostname,x.x.x.x]
==>[spark.executor.extraClassPath,/data/ceph/.janusgraph-0.3.1/lib/*]
==>[gremlin.hadoop.graphWriter,org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat]
==>[janusgraphmr.ioformat.conf.storage.cassandra.keyspace,db20190530_1450_20190531_183034]
==>[gremlin.hadoop.jarsInDistributedCache,false]
==>[gremlin.spark.graphStorageLevel,MEMORY_AND_DISK]
==>[cassandra.input.partitioner.class,org.apache.cassandra.dht.Murmur3Partitioner]
==>[gremlin.hadoop.scriptOutputFormat.script,/data/ceph/cb6621da-8809-11e9-9917-6c92bf5eb70e.groovy]
==>[gremlin.spark.persistStorageLevel,DISK_ONLY]
==>[gremlin.hadoop.outputLocation,file:///data/ceph/cb6621da-8809-11e9-9917-6c92bf5eb70e]
==>[gremlin.spark.persistContext,true]
==>[janusgraphmr.ioformat.conf.storage.port,9160]
==>[janusgraphmr.ioformat.conf.storage.backend,cassandra]
==>[gremlin.hadoop.graphReader,org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat]
==>[gremlin.hadoop.inputLocation,none]

Since I'm using local[10] master, there should? exist no jar/java version issues.
I think the root cause is global variable groovysh evaluate, which is introduced by owner/delegate of my closure. I am not a groovy expert, my question is how to make my groovy closure serializable in this setup. 

Thank you for any help.


Passing custom Normalizer to ES

shrikant pachauri <sk.pa...@...>
 

Hi all !!! I was wondering if how to pass custom normalizer to ES. I want to sort the text in a case insensitive manner. In short I want to use 3rd method for the purpose as described in 
https://www.technetexperts.com/web/case-insensitive-sorting-in-elasticsearch/

In the description of link https://docs.janusgraph.org/latest/field-mapping.html, in the end, it's written that you can pass normalizer as a custom parameter. Can anyone explain how to pass?? 
Or any other way to achieve the case insensitive sorting using ES.


Regards,
Shrikant


[ANNOUNCE] JanusGraph 0.2.3 Release

chris...@...
 

I am excited to announce the release of JanusGraph 0.2.3.

JanusGraph is an Apache TinkerPop enabled property graph database with support for a variety of storage and indexing backends. Thank you to all of the contributors.

JanusGraph 0.2.3 is a fix release, so it contains only bug fixes and patch level updates on the dependencies.

The release artifacts can be found at this location:
        https://github.com/JanusGraph/janusgraph/releases/tag/v0.2.3

A binary distribution is provided for user convenience:
        https://github.com/JanusGraph/janusgraph/releases/download/v0.2.3/janusgraph-0.2.3-hadoop2.zip

The online docs can be found here:
        https://docs.janusgraph.org/0.2.3/index.html

To view the resolved issues and commits check the milestone here:
Chris Hupman


Re: Error HADOOP_HOME not set while trying to run gremlin console for the first time

Alex Maier <alexand...@...>
 

OK, it appears, that WinUtils is some Hadoop binaries for Windows that are compiled and can be found https://github.com/steveloughran/winutils . I guess it is not in any case the full Hadoop installation on Windows (Hadoop, it seems to me, does not run on Windows), but it is some kind of utilities that brings the name of Hadoop similarly as Janusgraph brings the word Hadoop although it can be run on many other storages. So, the naming is quite confusing, it would be nice to document availability and necessity of WinUtils.exe.

But as a result of all this - gremlin prom appeared. Now I can go futher.

Alex


Re: Error HADOOP_HOME not set while trying to run gremlin console for the first time

HadoopMarc <bi...@...>
 

Hi Alex,

I have no experience with JanusGraph on Windows, but from the janusgraph-0.3.1 gremlin.bat:
:: Hadoop winutils.exe needs to be available because hadoop-gremlin is installed and active by default
IF NOT DEFINED HADOOP_HOME
(
    SET JANUSGRAPH_WINUTILS
=%JANUSGRAPH_HOME%\bin\winutils.exe
    IF EXIST
!JANUSGRAPH_WINUTILS! (
        SET HADOOP_HOME
=%JANUSGRAPH_HOME%
   
) ELSE (
        ECHO HADOOP_HOME
is not set.
        ECHO
Download http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
        ECHO
Place it under !JANUSGRAPH_WINUTILS!
        PAUSE
        GOTO
:eof
   
)
)

This script tells you to install the winutils.exe utility. Did you get other error messages?

Cheers,    Marc


Op maandag 3 juni 2019 14:54:33 UTC+2 schreef Alex Maier:

Hi!

I have made the SO questions:

Maybe someone can make answers? Essentially - I have running Cassandra on the Windows 10, I have downloaded JanusGraph and unzipped and now I am trying to run gremlin console Windows bat file. But it requires HADOOP_HOME to be set. Where can I configure JanusGraph/Gremlin that they don't require HADOOP_HOME and Hadoop installation but they use Cassandra instead.

I had a look inside gremlin.bat and it is quite strange that it requires HADOOP_HOME unconditionally?!

I am trying to follow https://www.bluepiit.com/blog/janusgraph-with-cassandra/ and it does not say that I need to indicate storage type in some global JanusGraph configuration file. Why all the JanusGraph documentation is so silent and unclear about the basic aspect as using non-Hadoop storage?

:(
Alex


ClassNotFoundException when reindex using MapReduce

Chen Wu <cjx...@...>
 

Hi, I'm trying to do the reindex using mapreduce following the steps listed in https://docs.janusgraph.org/latest/index-admin.html.
In gremlin console, I input the following commands:

mgmt = graph.openManagement()
import org.janusgraph.hadoop.MapReduceIndexManagement
mr = new MapReduceIndexManagement(graph)
mr.updateIndex(mgmt.getGraphIndex('allMixedIndex'), SchemaAction.REINDEX).get()

However, the following error shown up:

Error: java.lang.ClassNotFoundException: org.janusgraph.diskstorage.keycolumnvalue.scan.ScanMetrics
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2180)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2145)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2239)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:187)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

Is there any extra configurations to set in the properties file(such as spark.executor.extraClassPath when using spark to do the OLAP) or some other places?
It seems that org.janusgraph.diskstorage.keycolumnvalue.scan.ScanMetrics class is in janusgraph-core-0.3.1.jar, how can I add these janusgraph jars to the classpath of the mapreduce?

Any idea would be greatly appreciated~

2881 - 2900 of 6661