Re: Cache expiration time
Ohad Pinchevsky <ohad.pi...@...>
toggle quoted messageShow quoted text
On Tuesday, August 8, 2017 at 6:02:17 PM UTC+3, Jason Plurad wrote: According to the docs, it is a GLOBAL_OFFLINE configuration setting: "These options can only be changed for the entire database cluster at once when all instances are shut down." You'll need to set the value using the ManagementSystem. If you want to do it through a remote console session, you could try something like this: gremlin> :remote connect tinkerpop.server conf/remote.yaml ==>Configured localhost/127.0.0.1:8182 gremlin> :> mgmt = graph.openManagement(); mgmt.set('cache.db-cache-time', 360000); mgmt.commit(); true ==>true
At this point, the value is set but it is not active. You need to restart the Gremlin Server so the new configuration is picked up. Another thing you should be aware of when working with GLOBAL_OFFLINE properties, is that you can't change the value if there are multiple open graph instances -- for example, you have the Gremlin Server started and also make a direct connection with JanusGraphFactory.open(). You should shutdown all connections so there is only 1 remaining (you can verify with mgmt.getOpenInstances()) before attempting to set the configuration property. -- Jason On Tuesday, August 8, 2017 at 7:42:17 AM UTC-4, Ohad Pinchevsky wrote: Hi,
I am trying to increase/disable the cache expiration time using the cache.db-cache-time property I changed the value to 0 and restarted the Gemlin server, but it seems it is not working (based on execution time, first time slow, second fast, waiting, third time slow again).
What am I missing?
Thanks, Ohad
|
|
How can we bulk load the edges while we have the vertexes in our JanusGraph DB?
Assume, we have the vertexes in DB. and we have the edge information in GraphSON/XML/TXT? how can we import the edges into JanusGraph?
|
|
Re: Index on a vertex label from Java
Peter Schwarz <kkup...@...>
Not the answer I was hoping for, but thanks!
toggle quoted messageShow quoted text
On Tuesday, August 8, 2017 at 8:15:48 AM UTC-7, Jason Plurad wrote: You can't create an index on a vertex label right now. See https://github.com/JanusGraph/janusgraph/issues/283You can create an index on a property. For example, you could define a property called "mylabel", create a composite index on it, then do g.V().has("mylabel", "foo").count().next(). On Monday, August 7, 2017 at 5:06:19 PM UTC-4, Peter Schwarz wrote: How does one create an index on a vertex label from Java? I want to speed up queries that retrieve or count the vertices with a particular label, e.g. g.V().hasLabel("foo").count().next(). In Gremlin-Groovy, I think you can use getPropertyKey(T.label) to reference the key that represents a label and pass that to addKey, but this does not work in Java because getPropertyKey expects a String and T.label is an enum. What's the right way to do this?
|
|
Currently JanusGraph vertex ID is of type 64bits long, is it possible to also support UUID as the vertex ID?
|
|
Re: [BLOG] Configuring JanusGraph for spark-yarn
Joe Obernberger <joseph.o...@...>
Could you let us know a little more about your configuration?
What is your storage backend for JanusGraph (HBase/Cassandra)? I
actually do not see an error in your log, but at the very least
you'll need to defein spark.executor.extraClassPath to point to
the various jars required. Are there other logs you can look at
such as the container logs for YARN?
I assume you've see Marc's blog post:
http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html
-Joe
On 8/8/2017 3:45 PM, Gariee wrote:
toggle quoted messageShow quoted text
Hi Marc,
Request your help, I am runnign Janusgraph with maprdb as
backend, I have successfuly been able to create GodofGraphs
example on M7 as backend
But when I am trying to
execute the following where cluster is mapr and spark on
yarn
plugin
activated: tinkerpop.tinkergraph
gremlin>
graph =
GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
==>hadoopgraph[gryoinputformat->nulloutputformat]
gremlin>
g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->nulloutputformat],
sparkgraphcomputer]
gremlin>
g.V().count()
hadoop-load.properties ( I
tried all different combinations as commented below each
time its the same error)
#
# SparkGraphComputer
Configuration
#
spark.master=yarn-client
spark.yarn.queue=cmp
mapred.job.queue.name=cmp
#spark.driver.allowMultipleContexts=true
#spark.executor.memory=4g
#spark.ui.port=20000
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
#spark.executor.instances=10
#spark.executor.cores=2
#spark.executor.CoarseGrainedExecutorBackend.cores=2
#spark.executor.CoarseGrainedExecutorBackend.driver=FIXME
#spark.executor.CoarseGrainedExecutorBackend.stopping=false
#spark.streaming.stopGracefullyOnShutdown=true
#spark.yarn.driver.memoryOverhead=4g
#spark.yarn.executor.memoryOverhead=1024
#spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557
--------------------------------------------------------------------------------------------------------------
yarn log
------------------------------------------------------------------------------------------------------------
When the last command is
executed, driver abruptly shuts down in yarn container and
shuts down the spark context too with following error from
Yarn logs
Container:
container_e27_1501284102300_47651_01_000008 on abcd.com_8039
LogType:stderr
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:2441
Log Contents:
17/07/31 14:08:05 INFO
executor.CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
17/07/31 14:08:05 INFO spark.SecurityManager: Changing view
acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager: Changing
modify acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager:
SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(cmphs); users with modify
permissions: Set(cmphs)
17/07/31 14:08:06 INFO spark.SecurityManager: Changing view
acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager: Changing
modify acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager:
SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(cmphs); users with modify
permissions: Set(cmphs)
17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger
started
17/07/31 14:08:06 INFO Remoting: Starting remoting
17/07/31 14:08:06 INFO Remoting: Remoting started; listening
on addresses
:[akka.tcp://sparkExecutorActorSystem@...:36376]
17/07/31 14:08:06 INFO util.Utils: Successfully started
service 'sparkExecutorActorSystem' on port 36376.
17/07/31 14:08:06 INFO storage.DiskBlockManager: Created
local directory at
/tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2
17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore
started with capacity 2.7 GB
17/07/31 14:08:06 INFO
executor.CoarseGrainedExecutorBackend: Connecting to driver:
spark://CoarseGrainedScheduler@...:43768
17/07/31 14:08:06 INFO
executor.CoarseGrainedExecutorBackend: Successfully
registered with driver
17/07/31 14:08:06 INFO executor.Executor: Starting executor
ID 7 on host abcd.com
17/07/31 14:08:06 INFO util.Utils: Successfully started
service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 44208.
17/07/31 14:08:06 INFO netty.NettyBlockTransferService:
Server created on 44208
17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to
register BlockManager
17/07/31 14:08:06 INFO storage.BlockManagerMaster:
Registered BlockManager
17/07/31 14:08:41 INFO
executor.CoarseGrainedExecutorBackend: Driver commanded a
shutdown
17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore
cleared
17/07/31 14:08:41 INFO storage.BlockManager: BlockManager
stopped
17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown
hook called
End of LogType:stderr
LogType:stdout
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:0
Log Contents:
End of LogType:stdout
I am stuck wit h this
error , request help here.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Re: [BLOG] Configuring JanusGraph for spark-yarn
Hi Marc,
Request your help, I am runnign Janusgraph with maprdb as backend, I have successfuly been able to create GodofGraphs example on M7 as backend
But when I am trying to execute the following where cluster is mapr and spark on yarn
plugin activated: tinkerpop.tinkergraph gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties') ==>hadoopgraph[gryoinputformat->nulloutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat->nulloutputformat], sparkgraphcomputer] gremlin> g.V().count()
hadoop-load.properties ( I tried all different combinations as commented below each time its the same error)
# # SparkGraphComputer Configuration # spark.master=yarn-client
spark.yarn.queue=cmp mapred.job.queue.name=cmp
#spark.driver.allowMultipleContexts=true #spark.executor.memory=4g #spark.ui.port=20000
spark.serializer=org.apache.spark.serializer.KryoSerializer spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
#spark.executor.instances=10 #spark.executor.cores=2 #spark.executor.CoarseGrainedExecutorBackend.cores=2 #spark.executor.CoarseGrainedExecutorBackend.driver=FIXME #spark.executor.CoarseGrainedExecutorBackend.stopping=false #spark.streaming.stopGracefullyOnShutdown=true #spark.yarn.driver.memoryOverhead=4g #spark.yarn.executor.memoryOverhead=1024 #spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557
-------------------------------------------------------------------------------------------------------------- yarn log ------------------------------------------------------------------------------------------------------------ When the last command is executed, driver abruptly shuts down in yarn container and shuts down the spark context too with following error from Yarn logs
Container: container_e27_1501284102300_47651_01_000008 on abcd.com_8039LogType:stderr Log Upload Time:Mon Jul 31 14:08:42 -0700 2017 LogLength:2441 Log Contents: 17/07/31 14:08:05 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/07/31 14:08:05 INFO spark.SecurityManager: Changing view acls to: cmphs 17/07/31 14:08:05 INFO spark.SecurityManager: Changing modify acls to: cmphs 17/07/31 14:08:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs) 17/07/31 14:08:06 INFO spark.SecurityManager: Changing view acls to: cmphs 17/07/31 14:08:06 INFO spark.SecurityManager: Changing modify acls to: cmphs 17/07/31 14:08:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs) 17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/07/31 14:08:06 INFO Remoting: Starting remoting 17/07/31 14:08:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@...:36376] 17/07/31 14:08:06 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 36376. 17/07/31 14:08:06 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2 17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore started with capacity 2.7 GB 17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@...:43768 17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver 17/07/31 14:08:06 INFO executor.Executor: Starting executor ID 7 on host abcd.com 17/07/31 14:08:06 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44208. 17/07/31 14:08:06 INFO netty.NettyBlockTransferService: Server created on 44208 17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to register BlockManager 17/07/31 14:08:06 INFO storage.BlockManagerMaster: Registered BlockManager 17/07/31 14:08:41 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown 17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore cleared 17/07/31 14:08:41 INFO storage.BlockManager: BlockManager stopped 17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Mon Jul 31 14:08:42 -0700 2017 LogLength:0 Log Contents: End of LogType:stdout
I am stuck wit h this error , request help here.
|
|
Issue when trying to use Spark Graph Computer
Hi, I am trying to execute the following where cluster is mapr and spark on yarn graph=GraphFactory.open('conf/hadoop-graph/hadoop-load.properties') g = graph.traversal(computer(SparkGraphComputer)) g.V().count() When the last command is executed, driver abruptly shuts down with with following trace from Yarn logs Container: container_e27_1501284102300_47651_01_000008 on abcd.com_8039LogType:stderr Log Upload Time:Mon Jul 31 14:08:42 -0700 2017 LogLength:2441 Log Contents: 17/07/31 14:08:05 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/07/31 14:08:05 INFO spark.SecurityManager: Changing view acls to: cmphs 17/07/31 14:08:05 INFO spark.SecurityManager: Changing modify acls to: cmphs 17/07/31 14:08:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs) 17/07/31 14:08:06 INFO spark.SecurityManager: Changing view acls to: cmphs 17/07/31 14:08:06 INFO spark.SecurityManager: Changing modify acls to: cmphs 17/07/31 14:08:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs) 17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/07/31 14:08:06 INFO Remoting: Starting remoting 17/07/31 14:08:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@...:36376] 17/07/31 14:08:06 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 36376. 17/07/31 14:08:06 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2 17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore started with capacity 2.7 GB 17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@...:43768 17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver 17/07/31 14:08:06 INFO executor.Executor: Starting executor ID 7 on host abcd.com 17/07/31 14:08:06 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44208. 17/07/31 14:08:06 INFO netty.NettyBlockTransferService: Server created on 44208 17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to register BlockManager 17/07/31 14:08:06 INFO storage.BlockManagerMaster: Registered BlockManager 17/07/31 14:08:41 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown 17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore cleared 17/07/31 14:08:41 INFO storage.BlockManager: BlockManager stopped 17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Mon Jul 31 14:08:42 -0700 2017 LogLength:0 Log Contents: End of LogType:stdout I am stuck wit h this error , request help here.
|
|
Re: Creating a gremlin pipeline from an arraylist
Jason Plurad <plu...@...>
You probably should benchmark it, but I'd think that the injection would be faster since you already have the edges resolved. I think using the graph step g.E(a) would ultimately re-lookup the edges by id.
toggle quoted messageShow quoted text
On Tuesday, August 8, 2017 at 12:23:50 PM UTC-4, Raymond Canzanese wrote: I have an arraylist a of edges that I want to make gremlin queries over. In the old days, I would do:
a._()
And have a pipeline I could work with. Now it seems I can do:
g.inject(a).unfold()
or
g.E(a)
Which of these techniques should I prefer? Is one of them more efficient than the other?
|
|
Re: A few questions about JanusGraph.
Jason Plurad <plu...@...>
Sounds like the documentation could use some improvements to help make this more clear. I've opened up an issue to track it. 1) What is the relation between Gremlin server (bin/gremlin-server.bat) and the JanusGraph server (bin/janusgraph.sh)? The pre-packaged distribution of JanusGraph starts an instance of Cassandra, Elasticsearch, and Gremlin Server to allow users to get started quickly. You can start a Gremlin Server manually with bin/gremlin-server.sh 2) Properties in janusgraph-cassandra-es-server.properties vs. JanusGraphFactory.build().set()...open() If you want to connect to the same graph that the Gremlin Server has defined, yes, you should use the same properties. Using a properties file could make this easier for reuse JanusGraphFactory.open("conf/gremlin-server/janusgraph-cassandra-es-server.properties"), but if you JanusGraphFactory.build().set()...open() with the same properties, you'll be connecting to the same graph. 3) In the above API, I haven't specified the JanusGraph server endpoint
(the URL or the port), so which server is my Java code connecting to? Your code is connecting to the Cassandra server. When you configure a graph using JanusGraphFactory.open(), your application is creating an embedded graph instance. It is not connecting to the graph instance running on the JanusGraph server. The graph data is ultimately stored in Cassandra, so both the JanusGraph Server and your application are working with the same graph data. That being said, you could connect to the graph instance on the Gremlin Server using a remote connection as described in the TinkerPop docs. 4) Does Java API use websockets, and can JanusGraph server run on a
different machine (right now, my Cassandra and gremlin server run on the
same machine)? In the scenario where you have an embedded graph instance, your calls to the graph are not using WebSockets. Your application is communicating directly with the Cassandra storage backend using Thrift. A Gremlin Server can run on a different machine than the storage backend. The janusgraph-cassandra-es-server.properties lets the Gremlin Server know where to find the storage backend (see the storage.hostname property). 5) Is Java API the same as Gremlin language / API? JanusGraph implements the Apache TinkerPop APIs, including Gremlin. When you are doing graph traversals, you are dealing with TinkerPop's Gremlin language -- i.e. g.V().has("name", "manoj").toList(). The schema and index APIs are specific to JanusGraph because these are not provided by the TinkerPop abstraction. 6) Where is the documentation / examples for REST API (for adding / querying vertices, edges)? JanusGraph doesn't currently have much for that at the moment. Gremlin Server can be configured to support an HTTP endpoint which evaluates any Gremlin. It doesn't expose specific endpoints for /vertices or /edges, but you can do all that and more with the Gremlin endpoint. 7) How can one achieve graph namespacing? If they are completely separate graphs, creating separate keyspaces works great. You could host them all within the same graph by making sure that you don't overlap labels and property names. You could also consider the Partition Strategy. 8) If the graphs have to be stored in different Cassandra keyspaces, how
can I connect to these different graphs / keyspaces from the same Java
application? Create a separate graph instance for each keyspace using storage.cassandra.keyspace in the configuration. You can define multiple graphs in the gremlin-server.yaml configuration with different properties files. Similarly you can connect to multiple graph instance from your application.
toggle quoted messageShow quoted text
On Tuesday, August 8, 2017 at 11:54:15 AM UTC-4, Manoj Waikar wrote: Hi,
I have read the JanusGraph documentation and the GraphOfTheGodsFactory.java file, and I also have a small sample running, However, I am still not clear about the following doubts related to JanusGraph -
1) What is the relation between Gremlin server (bin/gremlin-server.bat) and the JanusGraph server (bin/janusgraph.sh)?
2) I've specified my Cassandra related configuration values in conf/gremlin-server/janusgraph-cassandra-es-server.properties file and this file is being used when running the gremlin server. While using the Java API (from Scala), I do the following -
val graph: JanusGraph = JanusGraphFactory.build(). set("storage.backend", "cassandra"). set("storage.hostname", "localhost"). set("storage.cassandra.keyspace", "MyJG"). set("storage.username", "username"). set("storage.password", "password"). open()
Should I be using the same (conf/gremlin-server/ janusgraph-cassandra-es- server.properties) file which I use to start the gremlin server from my Java code?
3) In the above API, I haven't specified the JanusGraph server endpoint (the URL or the port), so which server is my Java code connecting to?
4) Does Java API use websockets, and can JanusGraph server run on a different machine (right now, my Cassandra and gremlin server run on the same machine)?
5) Is Java API the same as Gremlin language / API?
6) Where is the documentation / examples for REST API (for adding / querying vertices, edges)?
7) How can one achieve graph namespacing? So for example, I have to create three different graphs for employees, vehicles and cities, how can I segregate the data for these three graphs? Can I give a name / id to the graph? Or do these graphs have to be stored in different Cassandra keyspaces?
8) If the graphs have to be stored in different Cassandra keyspaces, how can I connect to these different graphs / keyspaces from the same Java application?
Thanks in advance for the help.
|
|
Re: Do We Need Specialized Graph Databases? Benchmarking Real-Time Social Networking Applications
Raymond Canzanese <r...@...>
Looking forward to reading about your colleagues findings, Jason. Not using indices would certainly at least partially explain the poor performance given the types of queries they were making.
toggle quoted messageShow quoted text
On Monday, August 7, 2017 at 1:44:33 PM UTC-4, Stephen Mallette wrote: It did use parameters. They basically forked Jonathan Ellithorpe's work:
converted all the embedded Gremlin to strings.
Not sure how much they modified the Gremlin statements from the Ellithorpe repo. I stopped digging into it once I didn't see vertex centric indices defined and other data modelling choices I probably wouldn't have taken. LDBC is "complex" in the sense that it takes time to dig into - hasn't really been a priority to me.
I'm not sure why Gremlin Server got smacked around so badly in what they did. I couldn't find anything about how it was set up at all. They used TinkerPop 3.2.3 for their work - there have been a lot of enhancements since then in relation to memory management, so perhaps newer versions would have fared better in their tests. Again, hard to say what could/would have happened without spending a decent amount of time on it.
> then we'll be able to see what improvements can be made to the benchmark itself or within TinkerPop and JanusGraph
very cool, jason. glad your colleagues could spend some time on that. it would be nice to hear what they find.
On Mon, Aug 7, 2017 at 1:05 PM, Jason Plurad <p...@...> wrote: This blew up a while ago on the Twitter last month https://twitter.com/adriancolyer/status/883226836561518594The testing set up was less than ideal for Titan. Cassandra isn't really meant for a single node install. The paper picked on Gremlin Server, but it didn't disclose anything about the server configuration. Some of the latency for the Gremlin Server-based runs could have been because they weren't using parameterized script bindings. Using the Gremlin Server is not a requirement for using Titan at all, and I'm aware of projects that don't even use it. There's a team in my company that is trying to reproduce the results in that paper, then we'll be able to see what improvements can be made to the benchmark itself or within TinkerPop and JanusGraph. On Thursday, August 3, 2017 at 2:05:23 PM UTC-4, Raymond Canzanese wrote: Has everyone seen this article out of the University of Waterloo, which concludes TinkerPop 3 to be not ready for prime time?
Do We Need Specialized Graph Databases? Benchmarking
Real-Time Social Networking Applications
Anil Pacaci, Alice Zhou, Jimmy Lin, and M. Tamer Özsu 10.1145/3078447.3078459
Interested to know what other folks think of this testing setup and set of conclusions.
|
|
Creating a gremlin pipeline from an arraylist
Raymond Canzanese <r...@...>
I have an arraylist a of edges that I want to make gremlin queries over. In the old days, I would do:
a._()
And have a pipeline I could work with. Now it seems I can do:
g.inject(a).unfold()
or
g.E(a)
Which of these techniques should I prefer? Is one of them more efficient than the other?
|
|
A few questions about JanusGraph.
Manoj Waikar <mmwa...@...>
Hi,
I have read the JanusGraph documentation and the GraphOfTheGodsFactory.java file, and I also have a small sample running, However, I am still not clear about the following doubts related to JanusGraph -
1) What is the relation between Gremlin server (bin/gremlin-server.bat) and the JanusGraph server (bin/janusgraph.sh)?
2) I've specified my Cassandra related configuration values in conf/gremlin-server/janusgraph-cassandra-es-server.properties file and this file is being used when running the gremlin server. While using the Java API (from Scala), I do the following -
val graph: JanusGraph = JanusGraphFactory.build(). set("storage.backend", "cassandra"). set("storage.hostname", "localhost"). set("storage.cassandra.keyspace", "MyJG"). set("storage.username", "username"). set("storage.password", "password"). open()
Should I be using the same (conf/gremlin-server/janusgraph-cassandra-es-server.properties) file which I use to start the gremlin server from my Java code?
3) In the above API, I haven't specified the JanusGraph server endpoint (the URL or the port), so which server is my Java code connecting to?
4) Does Java API use websockets, and can JanusGraph server run on a different machine (right now, my Cassandra and gremlin server run on the same machine)?
5) Is Java API the same as Gremlin language / API?
6) Where is the documentation / examples for REST API (for adding / querying vertices, edges)?
7) How can one achieve graph namespacing? So for example, I have to create three different graphs for employees, vehicles and cities, how can I segregate the data for these three graphs? Can I give a name / id to the graph? Or do these graphs have to be stored in different Cassandra keyspaces?
8) If the graphs have to be stored in different Cassandra keyspaces, how can I connect to these different graphs / keyspaces from the same Java application?
Thanks in advance for the help.
|
|
Re: [BLOG] Configuring JanusGraph for spark-yarn
Joe Obernberger <joseph.o...@...>
Hi Marc - thank you very much for your reply. I like your idea
about moving regions manually and will try that. As to OLAP vs
OLTP (I assume Spark vs none), yes I have those times.
For a 1.5G table in HBase the count just using the gremlin shell
without using the SparkGraphComputer:
graph = JanusGraphFactory.open('conf/graph.properties')
g=graph.traversal()
g.V().count()
takes just under 1 minute. Using spark it takes about 2 hours.
So something isn't right. They both return 3,842,755 vertices.
When I run it with Spark, it hits one of the region servers hard -
doing over 30k requests per second for those 2 hours.
-Joe
On 8/8/2017 3:17 AM, HadoopMarc wrote:
toggle quoted messageShow quoted text
Hi Joseph,
You ran into terrain I have not yet covered myself. Up till now
I have been using the graben1437 PR for Titan and for
OLAP I adopted a poor man's approach where node id's are
distributed over spark tasks and each spark executor makes its
own Titan/HBase connection. This performs well, but does not
have the nice abstraction of the HBaseInputFormat.
So, no clear answer to this one, but just some thoughts:
- could you try to move some regions manually and see
what it does to performance?
- how do your OLAP vertex count times compare to the OLTP count
times?
- how does the sum of spark task execution times compare to the
yarn start-to-end time difference you reported? In other words,
how much of the start-to-end time is spent in waiting for
timeouts?
- unless you managed to create a vertex with > 1GB size, the
RowTooBigException sounds like a bug (which you can report on
Jnausgraph's github page). Hbase does not like
large rows at all, so vertex/edge properties should not have
blob values.
@(David Robinson): do you have any additional thoughts on this?
Cheers, Marc
Op maandag 7 augustus 2017 23:12:02 UTC+2 schreef Joseph
Obernberger:
Hi Marc - I've been able to get it to run longer, but am
now getting a RowTooBigException from HBase. How does
JanusGraph store data in HBase? The current max size of a
row in 1GByte, which makes me think this error is covering
something else up.
What I'm seeing so far in testing with a 5 server cluster
- each machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and
has 20,001,105 rows. To do a g.V().count() takes 2 hours
and results in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10
regions, is 57,620,276 rows, and took 6.5 hours to run the
count and results in 10,859,491 nodes. When running, it
looks like it hits one server very hard even though the
YARN tasks are distributed across the cluster. One HBase
node gets hammered.
The RowTooBigException is below. Anything to try? Thank
you for any help!
org.janusgraph.core.JanusGraphException: Could not
process individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException:
Could not call index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException:
Could not execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to
repeated temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017, RpcRetryingCaller{globalStartTime=1501932111280,
pause=100, retries=35}, org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException:
Max row size allowed: 1073741824, but the row is bigger
than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
On 8/6/2017 3:50 PM, HadoopMarc wrote:
Hi ... and others, I have been offline for
a few weeks enjoying a holiday and will start looking
into your questions and make the suggested corrections.
Thanks for following the recipes and helping others with
it.
..., did you run the recipe on the same HDP sandbox and
same Tinkerpop version? I remember (from 4 weeks ago)
that copying the zookeeper.znode.parent
property from the hbase configs to the janusgraph
configs was essential to get janusgraph's
HBaseInputFormat working (that is: read graph data for
the spark tasks).
Cheers, Marc
Op maandag 24 juli 2017 10:12:13 UTC+2 schreef spi...@...:
hi,Thanks for your post.
I did it according to the post.But I ran into a
problem.
15:58:49,110 INFO SecurityManager:58
- Changing view acls to: rc
15:58:49,110 INFO SecurityManager:58
- Changing modify acls to: rc
15:58:49,110 INFO SecurityManager:58
- SecurityManager: authentication
disabled; ui acls disabled; users with
view permissions: Set(rc); users with
modify permissions: Set(rc)
15:58:49,111 INFO Client:58 -
Submitting application 25 to
ResourceManager
15:58:49,320 INFO YarnClientImpl:274
- Submitted application
application_1500608983535_0025
15:58:49,321 INFO
SchedulerExtensionServices:58 - Starting
Yarn extension services with app
application_1500608983535_0025 and
attemptId None
15:58:50,325 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:50,326 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:51,330 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:52,333 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:53,335 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:54,337 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:55,340
INFO Client:58 - Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,343 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
ACCEPTED)
15:58:56,802 INFO
YarnSchedulerBackend$YarnSchedulerEndpoint:58
- ApplicationMaster registered as
NettyRpcEndpointRef(null)
15:58:56,824 INFO JettyUtils:58 -
Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346 INFO Client:58 -
Application report for
application_1500608983535_0025 (state:
RUNNING)
15:58:57,347 INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:57,348 INFO
YarnClientSchedulerBackend:58 -
Application application_1500608983535_0025
has started running.
15:58:57,358 INFO Utils:58 -
Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService'
on port 47514.
15:58:57,358 INFO
NettyBlockTransferService:58 - Server
created on 47514
15:58:57,360 INFO
BlockManagerMaster:58 - Trying to register
BlockManager
15:58:57,363 INFO
BlockManagerMasterEndpoint:58 -
Registering block manager 10.200.48.112:47514
with 2.4 GB RAM, BlockManagerId(driver,
10.200.48.112, 47514)15:58:57,366 INFO
BlockManagerMaster:58 - Registered
BlockManager
15:58:57,585 INFO
EventLoggingListener:58 - Logging events
to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177 WARN
YarnSchedulerBackend$ YarnSchedulerEndpoint:70
- Container marked as failed:
container_e170_1500608983535_ 0025_01_000002
on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com.
Exit status: 1. Diagnostics: Exception
from container-launch.
Container id:
container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException
exitCode=1:
at
org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at
org.apache.hadoop.util.Shell.run(Shell.java:487)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc
Container exited with a non-zero exit
code 1
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread. java:748)
15:59:57,706 WARN
NettyRpcEndpointRef:91 - Error sending
message [message =
RequestExecutors(0,0,Map())] in 1
attempts
java.io.IOException: Connection reset
by peer
at
sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at
sun.nio.ch.IOUtil.read(IOUtil.java:192)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at
java.lang.Thread.run(Thread.java:748)
I am confused about that. Could you please help
me?
在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:
--
You received this message because you are subscribed to
the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
For more options, visit https://groups.google.com/d/optout.
|
|
Re: Index on a vertex label from Java
Jason Plurad <plu...@...>
You can't create an index on a vertex label right now. See https://github.com/JanusGraph/janusgraph/issues/283
You can create an index on a property. For example, you could define a property called "mylabel", create a composite index on it, then do g.V().has("mylabel", "foo").count().next().
toggle quoted messageShow quoted text
On Monday, August 7, 2017 at 5:06:19 PM UTC-4, Peter Schwarz wrote: How does one create an index on a vertex label from Java? I want to speed up queries that retrieve or count the vertices with a particular label, e.g. g.V().hasLabel("foo").count().next(). In Gremlin-Groovy, I think you can use getPropertyKey(T.label) to reference the key that represents a label and pass that to addKey, but this does not work in Java because getPropertyKey expects a String and T.label is an enum. What's the right way to do this?
|
|
Re: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space while loading bulk data
Misha Brukman <mbru...@...>
You might consider using a format other than JSON which can easily read incrementally, such as CSV or a more compact binary encoding such as: or you may want to use a streaming JSON reader which will read as little as possible to generate a meaningful callback. Several options exist (I have not tested these), e.g., If you search on Stack Overflow, you'll find others have had exactly the same issue as here with Python and JSON files, and the answers to those questions all pointed to incremental JSON parser libraries for Python as the solution to the OOMs.
toggle quoted messageShow quoted text
On Tue, Aug 8, 2017 at 10:49 AM, Amyth Arora <aroras....@...> wrote: Thanks Robert, I am going to check the file size and its contents when I reach home and also will try to load the file through the shell and post the update here. On Tue, 8 Aug 2017 at 8:14 PM, Robert Dale < rob...@...> wrote: Well, whatever, but your stacktrace points to: String fileContents = new File(jsonPath).getText('UTF-8' ) Thus, your file does not fit in memory - either available system memory or within jvm max memory.
--
Thanks & Regards ------------------------------ ----------------------- Amyth Arora-----------------------------------------------------
Web:
-----------------------------------------------------
-----------------------------------------------------
Social Profiles: -----------------------------------------------------
--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
|
Re: Cache expiration time
Jason Plurad <plu...@...>
According to the docs, it is a GLOBAL_OFFLINE configuration setting: "These options can only be changed for the entire database cluster at once when all instances are shut down." You'll need to set the value using the ManagementSystem. If you want to do it through a remote console session, you could try something like this: gremlin> :remote connect tinkerpop.server conf/remote.yaml ==>Configured localhost/127.0.0.1:8182 gremlin> :> mgmt = graph.openManagement(); mgmt.set('cache.db-cache-time', 360000); mgmt.commit(); true ==>true
At this point, the value is set but it is not active. You need to restart the Gremlin Server so the new configuration is picked up. Another thing you should be aware of when working with GLOBAL_OFFLINE properties, is that you can't change the value if there are multiple open graph instances -- for example, you have the Gremlin Server started and also make a direct connection with JanusGraphFactory.open(). You should shutdown all connections so there is only 1 remaining (you can verify with mgmt.getOpenInstances()) before attempting to set the configuration property. -- Jason
toggle quoted messageShow quoted text
On Tuesday, August 8, 2017 at 7:42:17 AM UTC-4, Ohad Pinchevsky wrote: Hi,
I am trying to increase/disable the cache expiration time using the cache.db-cache-time property I changed the value to 0 and restarted the Gemlin server, but it seems it is not working (based on execution time, first time slow, second fast, waiting, third time slow again).
What am I missing?
Thanks, Ohad
|
|
Re: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space while loading bulk data
Amyth Arora <aroras....@...>
Thanks Robert, I am going to check the file size and its contents when I reach home and also will try to load the file through the shell and post the update here.
toggle quoted messageShow quoted text
On Tue, 8 Aug 2017 at 8:14 PM, Robert Dale < rob...@...> wrote: Well, whatever, but your stacktrace points to: String fileContents = new File(jsonPath).getText('UTF-8') Thus, your file does not fit in memory - either available system memory or within jvm max memory.
--
Thanks & Regards ----------------------------------------------------- Amyth Arora-----------------------------------------------------
Web:
-----------------------------------------------------
-----------------------------------------------------
Social Profiles: -----------------------------------------------------
|
|
Re: Jetty ALPN/NPN has not been properly configured.
Amyth Arora <aroras....@...>
Hi Misha,
Yes, I did, I am sorry. I have found the solution to this. Am going to post it and close the issue on github once I reach home. Sorry about the duplicate issue.
toggle quoted messageShow quoted text
On Tue, 8 Aug 2017 at 7:56 PM, Misha Brukman < mbru...@...> wrote: Looks like you've also filed a GitHub issue on this: https://github.com/JanusGraph/janusgraph/issues/450 — please use either the mailing list or GitHub to report issues, but not both, as duplication isn't helpful.
Since this might be a bug, let's follow up on GitHub.
--
Thanks & Regards ----------------------------------------------------- Amyth Arora-----------------------------------------------------
Web:
-----------------------------------------------------
-----------------------------------------------------
Social Profiles: -----------------------------------------------------
|
|
Re: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space while loading bulk data
Well, whatever, but your stacktrace points to: String fileContents = new File(jsonPath).getText('UTF-8') Thus, your file does not fit in memory - either available system memory or within jvm max memory.
toggle quoted messageShow quoted text
On Tue, Aug 8, 2017 at 10:38 AM, Amyth Arora <aroras....@...> wrote: Hi Robert,
The file is about 325 mb in size and contains info about a million vertices and a million edges. Also I forgot to mention that prior to testing on bigtable I tried the same script and file to test janus with cassandra backend on the same machine, which worked fine.
while testing this with cassandra, I experienced a similar issue, which went away by the introduction of following configuration options:
storage.batch-loading ids.block-size
But in case of cassandra, the error was thrown while creating the edges. In case of bigtable, The exception is thrown as soon as the script is executed.
On Tue, 8 Aug 2017 at 7:51 PM, Robert Dale < rob...@...> wrote: Looks like your file doesn't fit in memory.
--
Thanks & Regards ------------------------------ ----------------------- Amyth Arora-----------------------------------------------------
Web:
-----------------------------------------------------
-----------------------------------------------------
Social Profiles: -----------------------------------------------------
|
|
Re: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space while loading bulk data
Amyth Arora <aroras....@...>
Hi Robert,
The file is about 325 mb in size and contains info about a million vertices and a million edges. Also I forgot to mention that prior to testing on bigtable I tried the same script and file to test janus with cassandra backend on the same machine, which worked fine.
while testing this with cassandra, I experienced a similar issue, which went away by the introduction of following configuration options:
storage.batch-loading ids.block-size
But in case of cassandra, the error was thrown while creating the edges. In case of bigtable, The exception is thrown as soon as the script is executed.
toggle quoted messageShow quoted text
On Tue, 8 Aug 2017 at 7:51 PM, Robert Dale < rob...@...> wrote: Looks like your file doesn't fit in memory.
--
Thanks & Regards ----------------------------------------------------- Amyth Arora-----------------------------------------------------
Web:
-----------------------------------------------------
-----------------------------------------------------
Social Profiles: -----------------------------------------------------
|
|