Date   

Graph Databases and Framework Meetup Group

Ioannis Papapanagiotou <idu...@...>
 

Dear all,

I created a generic meetup group for Graph databases and frameworks in the Bay Area. We (at Netflix) are thinking of hosting a meetup at some point in the near future so kicking the tires.
Please feel free to join: https://www.meetup.com/Graph-Databases-and-Frameworks/

Thank you,
Ioannis Papapanagiotou


Question/DISCUSS: Loading data into JanusGraph and Data Format

David Pitera <piter...@...>
 

Hello all,

I thought it might be beneficial to the community as a whole if we got a discussion going where users answered the following questions:

1. How are you currently loading data into JanusGraph? Are you using any specific tools? OSS tools? A custom tool that you should OSS and share?
2. What format is your data sitting it? CSV? GraphSON? GraphML? A Postgres database?
3. What are the pain points you've experienced when loading / trying to load data into JanusGraph?
4. Are you stream-loading? If so, what is the use case, and what tools are you using?
5. Are you exporting data out of JanusGraph? If so, why? How? To where?

Looking forward to hearing what you all have to say!
David


Re: A few observations about JanusGraph scripts / config files.

Robert Dale <rob...@...>
 

I am unable to find '8183' ever having existed in the code base.  

Robert Dale

On Wed, Aug 16, 2017 at 2:26 AM, Manoj Waikar <mmwa...@...> wrote:
Hi,

On Windows, using a terminal client like Babun one can run the bin/janusgraph.sh script. I tried running it but it doesn't work for me and this is the output -

Forking Cassandra...
Running `nodetool statusthrift`............ timeout exceeded (60 seconds)

Also, in section "7.1.1.1. Connecting to Gremlin Server" it is suggested to connect to the remote server using conf/remote.yaml file where port is mentioned as 8183. However the default websocket port in various other configuration files (like conf/gremlin-server/gremlin-server.yaml file) is 8182. I was able to connect to the remote server by changing the port from 8183 to 8182.

Should the port in the remote.yaml file be changed from 8183 to 8182?

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


JanusGraph and Cassandra modes.

Manoj Waikar <mmwa...@...>
 

Hi,

The Cassandra related JanusGraph documentation specifies various ways in which JanusGraph can be used in concert with Cassandra.

So, if I run Cassandra (on my machine using cassandra -f) and then from my Java / Scala code, if I do the following -
JanusGraph g = JanusGraphFactory.build().set("storage.backend", "cassandra").set("storage.hostname", "127.0.0.1").open();

Then -
  1. I am using the Local Server Mode.
  2. Whereas if Cassandra is running on another machine, and then if I replace 127.0.0.1 (localhost) with the IP of the server where Cassandra is running, then I am using the Remote Server Mode.
  3. Also, when Jason replied to my previous question 3, when he said "your application is creating an embedded graph instance" he didn't mean the JanusGraph Embedded Mode (because clearly, I am not running JanusGraph so no question of it running in the same JVM instance as Cassandra)?
Is my understanding correct?

So, when is the Remote Server Mode with Gremlin Server useful? Is it useful when non-Java based applications would like to communicate with Gremlin server?

Also, if I have to host a web application (written in Java / Scala, on my own server) which stores data in Cassandra, then which mode is best? Is it the local / remote server mode depending on where Cassandra resides with respect to the web server?

Thanks in advance for the replies / help.


Re: Hardware Calculation

Amyth Arora <aroras....@...>
 

our will have the following number:

Nodes557,159,709
Properties1,123,954,549
Relationships1,401,451,959

currently our graph runs on "neo4j" but we are in the process of migrating to janus/cassandra stack. neo4j has a hardware calculator (https://neo4j.com/hardware-sizing/). is there something similar available for janus/cassandra ?


On 10 August 2017 at 18:12, Amyth Arora <aroras....@...> wrote:
Hi everyone,

Is there some hardware calculator available for janusgraph to calculate how many instances of janus and cassandra would we require. Any help will be much appreciated.

--
Thanks & Regards
-----------------------------------------------------

Amyth Arora
-----------------------------------------------------

Web:

-----------------------------------------------------

Email Addresses:
aroras....@...,
-----------------------------------------------------

Social Profiles:
Twitter - @mytharora
    -----------------------------------------------------



    --
    Thanks & Regards
    -----------------------------------------------------

    Amyth Arora
    -----------------------------------------------------

    Web:

    -----------------------------------------------------

    Email Addresses:
    aroras....@...,
    -----------------------------------------------------

    Social Profiles:
    Twitter - @mytharora
      -----------------------------------------------------


      A few observations about JanusGraph scripts / config files.

      Manoj Waikar <mmwa...@...>
       

      Hi,

      On Windows, using a terminal client like Babun one can run the bin/janusgraph.sh script. I tried running it but it doesn't work for me and this is the output -

      Forking Cassandra...
      Running `nodetool statusthrift`............ timeout exceeded (60 seconds)

      Also, in section "7.1.1.1. Connecting to Gremlin Server" it is suggested to connect to the remote server using conf/remote.yaml file where port is mentioned as 8183. However the default websocket port in various other configuration files (like conf/gremlin-server/gremlin-server.yaml file) is 8182. I was able to connect to the remote server by changing the port from 8183 to 8182.

      Should the port in the remote.yaml file be changed from 8183 to 8182?


      Re: Can we create a new API based on JanusGraph?

      stan...@...
       

      Hello Rafael
      I've built the API use JanusGraph Server, and i have successfully access the api with
      ```
       curl 'http://localhost:8182/?gremlin=g.V()'
      ```
      In gremlin console i can index my vertex with 
      ```
       g.V().has('id_number', '3207221555223216')  
      ```
      I want to know how to get the same result with API
      在 2017年8月14日星期一 UTC+8下午11:06:52,Rafael Fernandes写道:

      no need my friend, just use JanusGraph Server...

      rafael fernandes

      On Sunday, August 13, 2017 at 11:27:57 PM UTC-4, hu junjie wrote:
      I mean open new API to get the customized data or post formatted data and load to JanusGraph.


      Re: Phantom vertices

      Robert Dale <rob...@...>
       

      You probably want to count the edges:

      g.V().hasLabel('person').group().by('name').by(outE().count()).order(local).by(values,decr).limit(local,5)

      Robert Dale

      On Tue, Aug 15, 2017 at 9:53 PM, Rohit Jain <rohit.j...@...> wrote:
      I found my phantom person Vertex.

      So, I was doing this query to find the top 5 actors who had acted in the most movies.  I don't think the query is correct for what I am trying to do since the result does not seem to be right.  However, when I first ran this query:

      g.V().hasLabel("person").groupCount().by("personid").order(local).by(values,decr).select(keys).limit(local, 5)

      I got an error saying that it could not find "personid" for this specific vertex id.  I did the following for the vertex id:

      gremlin> g.V(5578792).valueMap()
      ==>[]

      This told me that I had a vertex that did not have a personid for some reason.  That was my phantom vertex.  I dropped it and then the above query ran.  The result of the above query is ==>[1,2,3,4,5], which I know is not right.  From my SQL query using our product EsgynDB running on Apache Trafodion, I get:

      PERSON_ID  NAME                                 NUM                 
      ---------  -----------------------------------  --------------------

           6830  Robert De Niro                                         53
           5828  Morgan Freeman                                         43
           1053  Bruce Willis                                           38
           5363  Matt Damon                                             37
           4057  Johnny Depp                                            36


      Rohit

      --
      You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
      For more options, visit https://groups.google.com/d/optout.


      Re: Phantom vertices

      Rohit Jain <rohit.j...@...>
       

      I found my phantom person Vertex.

      So, I was doing this query to find the top 5 actors who had acted in the most movies.  I don't think the query is correct for what I am trying to do since the result does not seem to be right.  However, when I first ran this query:

      g.V().hasLabel("person").groupCount().by("personid").order(local).by(values,decr).select(keys).limit(local, 5)

      I got an error saying that it could not find "personid" for this specific vertex id.  I did the following for the vertex id:

      gremlin> g.V(5578792).valueMap()
      ==>[]

      This told me that I had a vertex that did not have a personid for some reason.  That was my phantom vertex.  I dropped it and then the above query ran.  The result of the above query is ==>[1,2,3,4,5], which I know is not right.  From my SQL query using our product EsgynDB running on Apache Trafodion, I get:

      PERSON_ID  NAME                                 NUM                 
      ---------  -----------------------------------  --------------------

           6830  Robert De Niro                                         53
           5828  Morgan Freeman                                         43
           1053  Bruce Willis                                           38
           5363  Matt Damon                                             37
           4057  Johnny Depp                                            36


      Rohit


      Re: Phantom vertices

      Rohit Jain <rohit.j...@...>
       

      Robert,

      This is EXACTLY what I was looking for!  I could not find how to do this.  I must educate myself better but don't seem to find an easy way to do that :-(  Maybe SQL is too ingrained in my blood and this learning requires different skills -- teaching an old dog new tricks ain't easy.

      Okay, so  it seems like I had some demigod, titan, location, god, human, and monster vertices that I don't remember adding.
      Now on the person vertices I get a min of 1 and a max of 8491, and I know I have a unique index on it.  But it still shows 8492.  This phantom is really strange.  I thought I would try to put it into a log file, pull it into Excel and figure out there what is going on.  I am wondering whether it is using an index which somehow got corrupted.  As you may have seen, I have an index that is in a strange state of INSTALLED that I cannot REINDEX or REMOVE.

      Anyway, thanks for this.  This is good progress.

      Rohit


      Re: Phantom vertices

      Robert Dale <rob...@...>
       


      Let's see what you've got:

      g.V().groupCount().by(label)

      g.V().values('movieid').min()
      g.V().values('movieid').max()

      g.V().values('personid').min()
      g.V().values('personid').max()


      Robert Dale

      On Tue, Aug 15, 2017 at 7:44 PM, Rohit Jain <rohit.j...@...> wrote:
      Hi folks,

      I created 4916 vertices with the label 'movie' and a property 'movieid' where the movieid goes from 1 to 4916.
      I created 8491 vertices with the label 'person' and a property 'personid' where the personid goes from 1 to 8491.
      I create 'role' edges from these person vertices to the movie vertices with a 'roletype' indicating actor or director

      When I do g.V().count() I get 13420 when I should get 13407.

      Looks like I have some phantom vertices.  How do I find them?

      Also, on a g.V().hasLabel('person').count(), I get 8492 instead of 8491.  So I even have a phantom personid that I don't know how to locate.

      Rohit

      --
      You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
      For more options, visit https://groups.google.com/d/optout.


      Re: How do I get out of a continuation line (?) in Gremlin

      Rohit Jain <rohit.j...@...>
       

      You are so quick Robert!! :-)

      I actually figured it out since I found that :h worked in that situation and then I saw :c.  But you beat me to my posting that I had found the solution to the problem.

      Thanks!!
      Rohit 


      Re: How do I get out of a continuation line (?) in Gremlin

      Robert Dale <rob...@...>
       

      The gremlin console is built on top of groovysh so many of the same commands and usage will apply.  See also gremlin console tutorial - http://tinkerpop.apache.org/docs/current/tutorials/the-gremlin-console/

      Robert Dale

      On Tue, Aug 15, 2017 at 7:55 PM, Robert Dale <rob...@...> wrote:
      :clear

      :help


      On Tue, Aug 15, 2017 at 19:55 Rohit Jain <rohit.j...@...> wrote:
      When I make a syntax error or have a command that ends up in two lines I end up with this situation:

      gremlin> g.V().hasNot('personid','movieid).count()
      ......1> 

      How do I get out of this?

      --
      You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
      For more options, visit https://groups.google.com/d/optout.
      --
      Robert Dale


      Re: How do I get out of a continuation line (?) in Gremlin

      Robert Dale <rob...@...>
       

      :clear

      :help


      On Tue, Aug 15, 2017 at 19:55 Rohit Jain <rohit.j...@...> wrote:
      When I make a syntax error or have a command that ends up in two lines I end up with this situation:

      gremlin> g.V().hasNot('personid','movieid).count()
      ......1> 

      How do I get out of this?

      --
      You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
      For more options, visit https://groups.google.com/d/optout.
      --
      Robert Dale


      How do I get out of a continuation line (?) in Gremlin

      Rohit Jain <rohit.j...@...>
       

      When I make a syntax error or have a command that ends up in two lines I end up with this situation:

      gremlin> g.V().hasNot('personid','movieid).count()
      ......1> 

      How do I get out of this?


      Phantom vertices

      Rohit Jain <rohit.j...@...>
       

      Hi folks,

      I created 4916 vertices with the label 'movie' and a property 'movieid' where the movieid goes from 1 to 4916.
      I created 8491 vertices with the label 'person' and a property 'personid' where the personid goes from 1 to 8491.
      I create 'role' edges from these person vertices to the movie vertices with a 'roletype' indicating actor or director

      When I do g.V().count() I get 13420 when I should get 13407.

      Looks like I have some phantom vertices.  How do I find them?

      Also, on a g.V().hasLabel('person').count(), I get 8492 instead of 8491.  So I even have a phantom personid that I don't know how to locate.

      Rohit


      Re: Index on a vertex label from Java

      Peter Schwarz <kkup...@...>
       

      Hmm, maybe I'll try that, thanks.  I briefly considered that approach, but ended up going with indexes on a "pseudo-label" instead.


      On Monday, August 14, 2017 at 8:15:53 AM UTC-7, Rafael Fernandes wrote:
      I actually needed to do the same thing back when I started my project, but I just ended up using edges instead of vertex labels...
      vertex centric indexes helped and improved the performance like crazy (today we can even filter by the period we want without any degradation).

      if you still can, just create a VCI on an edge and add all your vertices to that and query against it, your graph will thank you later :)...

      rafael fernandes

      On Tuesday, August 8, 2017 at 8:34:57 PM UTC-4, Peter Schwarz wrote:
      Not the answer I was hoping for, but thanks!

      On Tuesday, August 8, 2017 at 8:15:48 AM UTC-7, Jason Plurad wrote:
      You can't create an index on a vertex label right now. See https://github.com/JanusGraph/janusgraph/issues/283

      You can create an index on a property. For example, you could define a property called "mylabel", create a composite index on it, then do g.V().has("mylabel", "foo").count().next().


      On Monday, August 7, 2017 at 5:06:19 PM UTC-4, Peter Schwarz wrote:
      How does one create an index on a vertex label from Java?  I want to speed up queries that retrieve or count the vertices with a  particular label, e.g. g.V().hasLabel("foo").count().next().  In Gremlin-Groovy, I think you can use getPropertyKey(T.label) to reference the key that represents a label and pass that to addKey, but this does not work in Java because getPropertyKey expects a String and T.label is an enum.  What's the right way to do this?


      Re: Load CSV file to addVertex and addEdge

      hu junjie <hjj...@...>
       

      solved

      在 2017年8月15日星期二 UTC+8下午9:48:37,hu junjie写道:

      I have an alternative method to solve it. 2 times loop.
      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);}

      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p =l.split(",");v1=g.V().has('uuid',p[0]).next();v2=g.V().has('uuid',p[1]).next(); v1.addEdge(p[4],v2)}

      在 2017年8月15日星期二 UTC+8下午9:42:40,hu junjie写道:

      I have the below gremlin command below. it can work fine.

      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);}

      but the below can't work very well.

      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);v1.addEdge(p[4],v2)}

      The error is here:

      gremlin> new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);v1.addEdge(p[4],v2)}

      21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = 1a78de40-8f0a-1028-9c9e-db07163b51b2)]. For better performance, use indexes 21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = d803d140-8f0a-1028-98de-db07163b51b2)]. For better performance, use indexes 21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = 1a78de40-8f0a-1028-9c9e-db07163b51b2)]. For better performance, use indexes 21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = 92df9f40-8f0a-1028-8723-db07163b51b2)]. For better performance, use indexes No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.addEdge() is applicable for argument types: (java.lang.String, org.janusgraph.graphdb.vertices.StandardVertex) values: [CommunitiesMember, v[122908672]] Type ':help' or ':h' for help. Display stack trace? [yN]

      The csv file is below: 1a78de40-8f0a-1028-9c9e-db07163b51b2,d803d140-8f0a-1028-98de-db07163b51b2,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,92df9f40-8f0a-1028-8723-db07163b51b2,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,281edc40-3c20-102c-9a69-980191c9f99a,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,878c73c0-8f0a-1028-91a1-db07163b51b2,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,5427d240-9f1e-102c-9233-9c1aa9e13df3,2012-09-18T08:56:01Z,1,CommunitiesMember


      How can I get all the file names under one folder using gremlin?

      hu junjie <hjj...@...>
       

      How can I get all the file names under one file folder? and foreach all the file names? using gremlin command


      Re: Load CSV file to addVertex and addEdge

      hu junjie <hjj...@...>
       

      I have an alternative method to solve it. 2 times loop.
      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);}

      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p =l.split(",");v1=g.V().has('uuid',p[0]).next();v2=g.V().has('uuid',p[1]).next(); v1.addEdge(p[4],v2)}

      在 2017年8月15日星期二 UTC+8下午9:42:40,hu junjie写道:

      I have the below gremlin command below. it can work fine.

      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);}

      but the below can't work very well.

      new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);v1.addEdge(p[4],v2)}

      The error is here:

      gremlin> new File("data/1a78de40-8f0a-1028-9c9e-db07163b51b2.csv").eachLine{l->p=l.split(",");v1=g.V().has('uuid',p[0])?:graph.addVertex('uuid',p[0]);v2=g.V().has('uuid',p[1])?:graph.addVertex('uuid',p[1]);v1.addEdge(p[4],v2)}

      21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = 1a78de40-8f0a-1028-9c9e-db07163b51b2)]. For better performance, use indexes 21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = d803d140-8f0a-1028-98de-db07163b51b2)]. For better performance, use indexes 21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = 1a78de40-8f0a-1028-9c9e-db07163b51b2)]. For better performance, use indexes 21:30:12 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(uuid = 92df9f40-8f0a-1028-8723-db07163b51b2)]. For better performance, use indexes No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.addEdge() is applicable for argument types: (java.lang.String, org.janusgraph.graphdb.vertices.StandardVertex) values: [CommunitiesMember, v[122908672]] Type ':help' or ':h' for help. Display stack trace? [yN]

      The csv file is below: 1a78de40-8f0a-1028-9c9e-db07163b51b2,d803d140-8f0a-1028-98de-db07163b51b2,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,92df9f40-8f0a-1028-8723-db07163b51b2,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,281edc40-3c20-102c-9a69-980191c9f99a,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,878c73c0-8f0a-1028-91a1-db07163b51b2,2012-09-18T08:56:01Z,1,CommunitiesMember 1a78de40-8f0a-1028-9c9e-db07163b51b2,5427d240-9f1e-102c-9233-9c1aa9e13df3,2012-09-18T08:56:01Z,1,CommunitiesMember

      6061 - 6080 of 6663