Condition where-inV-is does not work
Anton Eroshenko <erosh...@...>
Hi, JanusGraph users.
I try a simple query from TinkerPop docs with my JanusGraph installation, but it does not work as expected. How is it possible: gremlin> g.V(41099392).outE('LINK').inV() ==>v[110792] ==>v[81993864] gremlin> g.V(41099392).outE('LINK').where(inV().is(V(110792))) gremlin>
The last query returns nothing...
How to filter out vertices by traversal? Appreciate any help
|
|
Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits
Evgeniy Ignatiev <yevgeniy...@...>
Oh, I recall that we once tried to debug the same issue with
JanusGraph-Hbase, had clear supernodes in the graph. No attempts
on repartitioning, including analyzing code of SparkGraphComputer
and tinkering around thought to make it work for partitioned
vertices etc. were successful - apparently using Cassandra (latest
3.x version at the time) didn't lead to OOM, but was noticeably
slower than HBase when we used it with smaller graphs.
Best regards,
Evgenii Ignatev.
On 15.12.2020 07:07, Roy Yu wrote:
toggle quoted message
Show quoted text
Thanks Marc
On Friday, December 11, 2020
at 3:40:25 PM UTC+8 HadoopMarc wrote:
Hi Roy,
I think I would first check whether the skew is absent if
you count the rows reading the HBase table directly from
spark (so, without using janusgraph), e.g.:
If this works all right, than you know that somehow in
janusgraph HBaseInputFormat the mappers do not get the right
key ranges to read from.
I also though about the storage.hbase.region-count property of
janusgraph-hbase. If you would specify this at 40 while
creating the graph, janusgraph-hbase would create many small
regions that will be compacted by HBase later on. But maybe
this creates a different structure in the row keys that can be
leveraged by the
hbase.mapreduce.tableinput.mappers.per.region.
Best wishes, Marc
Op woensdag 9 december
2020 om 17:16:35 UTC+1 schreef Roy Yu:
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be
effective. I set it to 40, and there are 40 tasks
processing every region. But here comes the new
promblem--the data skew. I use g.E().count() to count
all the edges of the graph. During counting one region,
one spark task containing all 2.6GB data, while other 39
tasks containing 0 data. The task failed again. I
checked my data. There are some vertices which have more
1 million incident edges. So I tried to solve this
promblem using vertex cut( https://docs.janusgraph.org/advanced-topics/partitioning/),
my graph schema is something like [ mgmt.makeVertexLabel('product').partition().make() ].
But when I using MR to load data to the new graph, it
consumed more than 10 times when the attemp without
using partition(), from the hbase
table detail page, I found the data loading process was
busy reading data from and writing data to the first
region. The first region became the hot spot. I guess it
relates to vertex ids. Could help me again?
On Tuesday, December
8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly
new janusgraph-hbase features. From the HBase
source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region"
config parameter.
It should not be too difficult to adapt the
janusgraph HBaseInputFormat to leverage this
feature (or maybe it even works without
change???).
Best wishes,
Marc
Op dinsdag 8
december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to
run on cloud infra that reduces your requested
40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls).
Fact of life.
---------------------
Sorry Marc I misled you. Error Message was
generated when I set spark.executor.memory to
30G, when it failed, I increased
spark.executor.memory to 40G, it failed either.
I felt desperate and come here to ask for help
On Tuesday,
December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu
wrote:
Hi Marc
Thanks for your immediate response.
I've tried to
set spark.yarn.executor.memoryOverhead=10G
and re-run the task, and it stilled failed.
From the spark task UI, I saw 80% of
processing time is Full GC time. As you
said, 2.6GB(GZ compressed) exploding is my
root cause. Now I'm trying to reduce my
region size to 1GB, if that will still fail,
I'm gonna config the hbase hfile not using
compressed format.
This was my first time running janusgraph
OLAP, and I think this is a common promblom,
as HBase region size 2.6GB(compressed) is
not large, 20GB is very common in our
production. If the community dose not solve
the promblem, the Janusgraph HBase based
OLAP solution cannot be adopted by other
companies either.
On
Tuesday, December 8, 2020 at 12:40:40 AM
UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things
bothering you here:
- you did not specify spark.yarn.executor.memoryOverhead,
as the exception message says.
Easily solved.
- you seem to run on cloud infra
that reduces your requested 40 Gb
to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls).
Fact of life.
- the janusgraph HBaseInputFormat
use sentire HBase regions as
hadoop partitions, which are fed
into spark tasks. The 2.6Gb region
size is for compressed binary data
which explodes when expanded into
java objects. This is your real
problem.
I did not follow the latest status of
janusgraph-hbase features for the
HBaseInputFormat, but you have to
somehow use spark with smaller
partitions than an entire HBase region.
A long time ago, I had success with
skipping the HBaseInputFormat and have
spark executors connect to JanusGraph
themselves. That is not a quick
solution, though.
Best wishes,
Marc
Op
maandag 7 december 2020 om 14:10:55
UTC+1 schreef Roy Yu:
Error
message:
ExecutorLostFailure (executor
1 exited caused by one of the
running tasks) Reason: Container
killed by YARN for exceeding
memory limits. 33.1 GB of 33 GB
physical memory used. Consider
boosting
spark.yarn.executor.memoryOverhead
or disabling
yarn.nodemanager.vmem-check-enabled
because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:MaxGCPauseMillis=500
-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log
spark.executor.cores=1
spark.executor.memory=40960m
spark.executor.instances=3
Region info:
hdfs dfs -du -h
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc
67 134
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp
2.6 G 5.1 G
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s
0 0
/apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t
root@~$
Anybody who can
help me?
--
You received this message because you are subscribed to the Google
Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/46bcc3bb-9e66-4fb1-add0-22374909fb63n%40googlegroups.com.
|
|
addE doesn't create more than 1 edge
Anton Eroshenko <erosh...@...>
I'm trying to link one vertex (let's say Activity) with two others (person), I expect to have two edges in result, the request below creates only one.
gremlin> g.V().hasLabel('Activity') ==>v[40984624] gremlin> g.V().hasLabel('person').has('id', within('p1', 'p2')) ==>v[40996896] ==>v[41037952] gremlin> g.V().hasLabel('Activity').addE('LINK').to(g.V().hasLabel('person').has('id', within('p1', 'p2'))) ==>e[oe5mu-oefxs-b0np-oepeo][40984624-RESP->40996896]
Is it a bug or I'm missing something?
|
|
Re: Configuring Transaction Log feature
Sandeep Mishra <sandy...@...>
The code explains behavior. the api sets start time to null instead of Instant.now() hence different behaviour. public LogProcessorBuilder setStartTimeNow() { this.startTime = null;return this; }
toggle quoted message
Show quoted text
On Saturday, December 12, 2020 at 10:26:59 PM UTC+8 Sandeep Mishra wrote:
Pawan, I was able to make your code work. the problem is "setStartTimeNow()" Instead use
setStartTime(Instant.now()) and test. It works. I am yet to explore difference between two api. make sure to use a new logidentifier to test.
Regards, Sandeep
On Wednesday, December 9, 2020 at 8:54:17 PM UTC+8 shr...@... wrote: Hi Sandeep,
I think I have already added below line to indicate that it should pull the detail from now onwords in processor. Is it not working?
"setStartTimeNow()"
Is anyone other face the same thing in their java code?
Thanks, Pawan
On Friday, 4 December 2020 at 16:22:51 UTC+5:30 sa...@... wrote: pawan,can you check for following in your logs Loaded unidentified ReadMarker start time... seems your readmarker is starting from 1970. so it tries to read changes since then
Regards, Sandeep On Saturday, November 28, 2020 at 8:48:18 PM UTC+8 shr...@... wrote: one correction to last post in below line.
JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("TestLog").start();
On Saturday, 28 November 2020 at 18:16:09 UTC+5:30 Pawan Shriwas wrote:
Hi Sandeep,
Please see below java code and properties information which I am trying in local with Cassandra cql as backend. This code is not giving me the change log as event which I can get via gremlin console with same script and properties. Please let me know if anything needs to be modify here with code or properties.
<!-- Java Code --> package com.example.graph;
import org.janusgraph.core.JanusGraph; import org.janusgraph.core.JanusGraphFactory; import org.janusgraph.core.JanusGraphTransaction; import org.janusgraph.core.JanusGraphVertex; import org.janusgraph.core.log.ChangeProcessor; import org.janusgraph.core.log.ChangeState; import org.janusgraph.core.log.LogProcessorFramework; import org.janusgraph.core.log.TransactionId;
public class TestLog { public static void listenLogsEvent(){ JanusGraph graph = JanusGraphFactory.open("/home/ist/Downloads/IM/jgraphdb_local.properties"); LogProcessorFramework logProcessor = JanusGraphFactory.openTransactionLog(graph);
logProcessor.addLogProcessor("TestLog"). setProcessorIdentifier("TestLogCounter"). setStartTimeNow(). addProcessor(new ChangeProcessor(){ @Override public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { System.out.println("tx--"+tx.toString()); System.out.println("txId--"+txId.toString()); System.out.println("changeState--"+changeState.toString()); } }). build(); for(int i=0;i<=10;i++) { System.out.println("going to add ="+i); JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("PawanTestLog").start(); JanusGraphVertex a = tx.addVertex("TimeL"); a.property("type", "HOLD"); a.property("serialNo", "XS31B4"); tx.commit(); System.out.println("Vertex committed ="+a.toString()); } } public static void main(String[] args) { System.out.println("starting main"); listenLogsEvent(); } }
<!----- graph properties-------> gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend = cql storage.hostname = localhost storage.cql.keyspace=janusgraphcql query.fast-property = true storage.lock.wait-time=10000 storage.batch-loading=true
Thanks in advance.
Thanks, Pawan
On Saturday, 28 November 2020 at 16:19:20 UTC+5:30 sa...@... wrote: Pawan, Can you elaborate more on the program where your are trying to embed the script in? Regards, Sandeep
On Sat, 28 Nov 2020, 13:48 Pawan Shriwas, < shr...@...> wrote: Hey Jason,
Same thing happen with my as well where above script work well in gremlin console but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.
On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend. The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution. import java.util.concurrent.atomic.*; import org.janusgraph.core.log.*; import java.util.concurrent.*;
graph = JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');
totalHumansAdded = new AtomicInteger(0); totalGodsAdded = new AtomicInteger(0); logProcessor = JanusGraphFactory.openTransactionLog(graph); logProcessor.addLogProcessor("addedPerson"). setProcessorIdentifier("addedPersonCounter"). setStartTime(Instant.now()). addProcessor(new ChangeProcessor() { public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { for (v in changeState.getVertices(Change.ADDED)) { if (v.label().equals("human")) totalHumansAdded.incrementAndGet(); System.out.println("total humans = " + totalHumansAdded); } } }). addProcessor(new ChangeProcessor() { public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { for (v in changeState.getVertices(Change.ADDED)) { if (v.label().equals("god")) totalGodsAdded.incrementAndGet(); System.out.println("total gods = " + totalGodsAdded); } } }). build()
tx = graph.buildTransaction().logIdentifier("addedPerson").start(); u = tx.addVertex(T.label, "human"); u.property("name", "proteros"); u.property("age", 36); tx.commit();
If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson". Did you have some example code of what you are attempting? On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote: Hi Guys,
We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {
Janusgraph documentation says value for log.[X].backend is 'default'. Not sure what exactly it means. does it mean HBase which is being used as backend for data.
Please let me know, if anyone has configured it.
Thanks and Regards, Sandeep Mishra
|
|
Re: Property with multiple data types
Hi Laura,
The JanusGraph storage backends can store many isolated graphs (see e.g. the storage.cql.keyspace configuration property). However, it is not possible to have edges between vertices from different graphs, so I guess this is not what you are looking for.
Your question is valid, and, by coincidence, it is currently discussed on the developer's list of Apache TinkerPop: https://lists.apache.org/thread.html/rd1b6f842b806dd9bca18d91faced3db14ab6cf4e55c9d762b9657d5e%40%3Cdev.tinkerpop.apache.org%3E
Best wishes, Marc
Op maandag 14 december 2020 om 18:34:24 UTC+1 schreef Laura Morales:
toggle quoted message
Show quoted text
Maybe I'm completely wrong, but would I be right to say that "labels" are the equivalent of Java classes? Like, one label represents a Java class and graph properties represent a class properties? So, saying that a node has label L would be like saying that a certain Java object is of class C? (That's why there's only one label per node). I was picturing labels as arbitrary strings that are attached to a node.
Then in Java, classes are namespaced to avoid collision. If my analogy is true, what would be the equivalent of namespaces with Janus?
Sent: Monday, December 14, 2020 at 2:29 PM
From: "HadoopMarc" <b...@...>
To: "JanusGraph users" <janu...@...>
Subject: Re: Property with multiple data types
Hi Laura,
Things are a bit different than you ask:
a vertex has a single label onlya property key has a single datatype only, but it can be Object.class, see https://docs.janusgraph.org/basics/schema/#property-key-data-typeindices can have a label constraint, but these are not helpful if you want to mix the datatypes in a property for the same vertexI cannot predict well how the various janusgraph parts will behave when mixing up real integers and "string-integers" in a property key of the Object.class datatype. I guess that the gremlin traversals will have problems, while an indexing backend for MixedIndices probably can deal with it. The ref docs definitely advise to use the basic datatypes and spare yourself future headaches (so, unify the datatypes on ingestion).
Best wishes, Marc
Op maandag 14 december 2020 om 09:07:31 UTC+1 schreef Laura Morales:
Thank you Marc, I think it does indeed! If I understand correctly, I can use labels to "namespace" my nodes, or in other words as a way to identify subgraphs.
If I have a node with 2 labels instead, say label1 and label2, I can create 2 indices for the same node, right? That is an index for label1.age (Integer) and an index for label2.age (String), both indices containing the same node. In this scenario I should be allowed to add 2 types of properties to the same node, one containing an Integer and the other one containing a String. Then query by choosing a specific label. Does this work? Can I do something like this?
Sent: Monday, December 14, 2020 at 8:01 AM
From: "HadoopMarc" <b...@...>
To: "JanusGraph users" <janu...@...>
Subject: Re: Property with multiple data types
Hi Laura,
Good that you pay close attention to understanding indices in JanusGraph because they are essential to proper use. Does the following section of the ref docs answers your question?
https://docs.janusgraph.org/index-management/index-performance/#label-constraint[https://docs.janusgraph.org/index-management/index-performance/#label-constraint]
Best wishes, Marc
Op zondag 13 december 2020 om 16:30:19 UTC+1 schreef Laura Morales:I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string?
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@...[mailto:janusgr...@...].
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com[https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer][https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com%5Bhttps://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer]].
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@...[mailto:janusgr...@...].
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/24ee1b44-3501-4d40-abef-b32aa345c959n%40googlegroups.com[https://groups.google.com/d/msgid/janusgraph-users/24ee1b44-3501-4d40-abef-b32aa345c959n%40googlegroups.com?utm_medium=email&utm_source=footer].
|
|
Re: Running OLAP on HBase with SparkGraphComputer fails with Error Container killed by YARN for exceeding memory limits
Thanks Marc
toggle quoted message
Show quoted text
On Friday, December 11, 2020 at 3:40:25 PM UTC+8 HadoopMarc wrote:
Hi Roy,
I think I would first check whether the skew is absent if you count the rows reading the HBase table directly from spark (so, without using janusgraph), e.g.:
If this works all right, than you know that somehow in janusgraph HBaseInputFormat the mappers do not get the right key ranges to read from.
I also though about the storage.hbase.region-count property of janusgraph-hbase. If you would specify this at 40 while creating the graph, janusgraph-hbase would create many small regions that will be compacted by HBase later on. But maybe this creates a different structure in the row keys that can be leveraged by the hbase.mapreduce.tableinput.mappers.per.region.
Best wishes, Marc
Op woensdag 9 december 2020 om 17:16:35 UTC+1 schreef Roy Yu:
Hi Marc,
The parameter
hbase.mapreduce.tableinput.mappers.per.region can be effective. I set it to 40, and there are 40 tasks processing every region. But here comes the new promblem--the data skew. I use g.E().count() to count all the edges of the graph. During counting one region, one spark task containing all 2.6GB data, while other 39 tasks containing 0 data. The task failed again. I checked my data. There are some vertices which have more 1 million incident edges. So I tried to solve this promblem using vertex cut( https://docs.janusgraph.org/advanced-topics/partitioning/), my graph schema is something like [ mgmt.makeVertexLabel('product').partition().make() ]. But when I using MR to load data to the new graph, it consumed more than 10 times when the attemp without using partition(), from the hbase table detail page, I found the data loading process was busy reading data from and writing data to the first region. The first region became the hot spot. I guess it relates to vertex ids. Could help me again? On Tuesday, December 8, 2020 at 3:13:42 PM UTC+8 HadoopMarc wrote:
Hi Roy,
As I mentioned, I did not keep up with possibly new janusgraph-hbase features. From the HBase source, I see that HBase now has a "hbase.mapreduce.tableinput.mappers.per.region" config parameter.
It should not be too difficult to adapt the janusgraph HBaseInputFormat to leverage this feature (or maybe it even works without change???).
Best wishes,
Marc
Op dinsdag 8 december 2020 om 04:21:19 UTC+1 schreef Roy Yu:
you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life. --------------------- Sorry Marc I misled you. Error Message was generated when I set spark.executor.memory to 30G, when it failed, I increased spark.executor.memory to 40G, it failed either. I felt desperate and come here to ask for help
On Tuesday, December 8, 2020 at 10:35:19 AM UTC+8 Roy Yu wrote:
Hi Marc
Thanks for your immediate response. I've tried to set spark.yarn.executor.memoryOverhead=10G and re-run the task, and it stilled failed. From the spark task UI, I saw 80% of processing time is Full GC time. As you said, 2.6GB(GZ compressed) exploding is my root cause. Now I'm trying to reduce my region size to 1GB, if that will still fail, I'm gonna config the hbase hfile not using compressed format. This was my first time running janusgraph OLAP, and I think this is a common promblom, as HBase region size 2.6GB(compressed) is not large, 20GB is very common in our production. If the community dose not solve the promblem, the Janusgraph HBase based OLAP solution cannot be adopted by other companies either.
On Tuesday, December 8, 2020 at 12:40:40 AM UTC+8 HadoopMarc wrote:
Hi Roy,
There seem to be three things bothering you here: - you did not specify spark.yarn.executor.memoryOverhead, as the exception message says. Easily solved.
- you seem to run on cloud infra that reduces your requested 40 Gb to 33 Gb (see https://databricks.com/session_na20/running-apache-spark-on-kubernetes-best-practices-and-pitfalls). Fact of life.
- the janusgraph HBaseInputFormat use sentire HBase regions as hadoop partitions, which are fed into spark tasks. The 2.6Gb region size is for compressed binary data which explodes when expanded into java objects. This is your real problem.
I did not follow the latest status of janusgraph-hbase features for the HBaseInputFormat, but you have to somehow use spark with smaller partitions than an entire HBase region. A long time ago, I had success with skipping the HBaseInputFormat and have spark executors connect to JanusGraph themselves. That is not a quick solution, though.
Best wishes,
Marc
Op maandag 7 december 2020 om 14:10:55 UTC+1 schreef Roy Yu:
Error message:ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 33.1 GB of 33 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
graph conifg:
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/mnt/data_1/log/spark2/gc-spark%p.log spark.executor.cores=1 spark.executor.memory=40960m spark.executor.instances=3
Region info: hdfs dfs -du -h /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc 67 134 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.regioninfo 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/.tmp 2.6 G 5.1 G /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/e 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/f 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/g 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/h 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/i 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/l 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/m 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/recovered.edits 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/s 0 0 /apps/hbase/data/data/default/ky415/f069fafb3ee51d6a2e5bc2377b468bcc/t root@~$
Anybody who can help me?
|
|
Re: Centric Indexes failing to support all conditions for better performance.
Thank you, looking forward to have profile() with such information added. Cheers, CM
toggle quoted message
Show quoted text
On Tuesday, December 15, 2020 at 2:26:02 AM UTC+11 Boxuan Li wrote:
Hi Christopher,
I don't have any workaround in mind except testing and comparing query latencies.
Btw as I mentioned earlier, if you use "hasNot" it almost never leverages index - no matter if it's a mixed or composite or vertex-centric index.
Best regards, Boxuan
chrism在 2020年12月14日星期一上午11:56:57 [UTC+8]寫道:
Thank you Boxuan Li,
It is obvious that your are an expert, is any other way apart of isFitted=true to know that index is used or not? (It may be even debugging JanusGraph server or Cassandra)
We need to construct Gremlin query, to utilize these indexes in full, and always,... problem is just what to type, as our implementation requires more complicated than above conditions to match, using above as sample it would be: (rating >= value AND time < value) OR HasNot( time ) - means that "time" was not specified. What is visible from profile() is that we cannot use coalesce() or or() steps, and trying all kind of workarounds cannot be verified easily having isFitted=false and no other "good" indication of using indexes.
Cheers, Christopher
On Sunday, December 13, 2020 at 7:24:13 PM UTC+11 li...@... wrote: Hi Christopher,
isFitted = true basically means no in-memory filtering is needed. If you see isFitted = false, it does not necessarily mean vertex-centric indexes are not used. It could be the case that some vertex-centric index is used, but further in-memory filtering is still needed. If you see isFitted = false, it does not necessarily mean any index is used. It could be the case that you are fetching all edges of a given vertex.
I totally understand your confusion because the documentation does not explain how the vertex-centric index is built. In JanusGraph, vertices and edges are stored in the “edgestore” store, while composite indexes are stored in the “graphindex” store. Mixed indexes
Roughly speaking, If you don’t have any vertex-centric index, then your edge is stored once for one endpoint. If you have one vertex-centric index, then applicable edges are stored twice. If you have two vertex-centric indexes, then applicable edges are stored three times… These edges, although seemingly duplicate, have different “sort key”s which conform to corresponding vertex-centric indexes. Let’s say you have built an “battlesByRating” vertex-centric index based on the property “rating”, then apart from the ordinary edge, JanusGraph creates an additional edge whose “sort key” is the rating value. Because the “column” is sorted in the underlying data storage (e.g. “column” in JanusGraph model is mapped to “clustering column” in Cassandra), you essentially gain the ability to search an index by “rating” value/range.
What happens when your vertex-centric index has two properties like the following?
> mgmt.buildEdgeIndex(battled, 'battlesByRatingAndTime', Direction.OUT, Order.asc, rating, time)
Now your “sort key” is a combination of “rating” and “time” (note “rating” comes before “time”). Under this vertex-centric index, “sort key”s look like this:
(rating=1, time=2), (rating=1, time=3), (rating=2, time=1), (rating=2, time=5), (rating=4, time=2), …
This explains why isFitted = true when your query is has('rating', 5.0).has('time', inside(10, 50)) but not when your query is has(’time', 5.0).has(‘rating', inside(10, 50)).Again, note that isFitted = false does not necessarily mean your query is not optimized by vertex-centric index. I think the profiler shall be improved to state whether and which vertex-centric index is used.
I am not quite sure about the case b) you mentioned. Seems it’s a design consideration but right now I cannot tell why it is there.
“hasNot" almost never uses indexes because JanusGraph cannot index something that does not exist. (Note that “null” value is not valid in JanusGraph).
Hope this helps.
Best regards, Boxuan On Dec 10, 2020, at 11:01 AM, chrism < cm...@...> wrote:
is describing usage of Vertex Centrix Index [edge=battled + properties=(rating,time)] g.V(h).outE('battled').has('rating', 5.0).has('time', inside(10, 50)).inV()
From my understanding profile() of above is reporting \_isFitted=true to indicate that backend-query delivered all results as conditions: \_condition=(rating = 0.5 AND time > 10 AND time < 50 AND type[battled])
Two things are obvious from above: centric index is supporting multiple property keys, and equality and range/interval constraints. However isFitted is false for all kind of conditions or combinations which are not really breaking the above rules, still in range constraints:
a) g.V(h).outE('battled').has('rating',lt(5.0)).has('time', inside(10, 50)).inV() // P.lt used for first key b) g.V(h).outE('battled').has('rating',gt(5.0)) // P.gt used c) g.V(h).outE('battled').or( hasNot('rating'), has('rating',eq(5.0)) ) // OrStep() used
Even b) can be "fitted" by has('rating',inside(5.0,Long.MAX_VALUE)) all that is very confusing, and probably not working as expected, what I am doing wrong? as from my experience only one property key can be used for query conditions and using index, the second is ignored.
Having isFitted=false is not really improving performance, from my understanding, when one only condition allows to get most of my edges and is asking to filter them in memory, as this is stated by implementation of BasicVertexCentricQueryBuilder.java. Are there limitations not described in the JG doco? It is a glitch?
Can you offer explanation how to utilize Centric Indexes for edges in full support?
Christopher
|
|
Re: Property with multiple data types
Laura Morales <laur...@...>
Maybe I'm completely wrong, but would I be right to say that "labels" are the equivalent of Java classes? Like, one label represents a Java class and graph properties represent a class properties? So, saying that a node has label L would be like saying that a certain Java object is of class C? (That's why there's only one label per node). I was picturing labels as arbitrary strings that are attached to a node. Then in Java, classes are namespaced to avoid collision. If my analogy is true, what would be the equivalent of namespaces with Janus? Sent: Monday, December 14, 2020 at 2:29 PM From: "HadoopMarc" <bi...@...> To: "JanusGraph users" <janusgra...@...> Subject: Re: Property with multiple data types Hi Laura, Things are a bit different than you ask: a vertex has a single label onlya property key has a single datatype only, but it can be Object.class, see https://docs.janusgraph.org/basics/schema/#property-key-data-typeindices can have a label constraint, but these are not helpful if you want to mix the datatypes in a property for the same vertexI cannot predict well how the various janusgraph parts will behave when mixing up real integers and "string-integers" in a property key of the Object.class datatype. I guess that the gremlin traversals will have problems, while an indexing backend for MixedIndices probably can deal with it. The ref docs definitely advise to use the basic datatypes and spare yourself future headaches (so, unify the datatypes on ingestion). Best wishes, Marc Op maandag 14 december 2020 om 09:07:31 UTC+1 schreef Laura Morales: Thank you Marc, I think it does indeed! If I understand correctly, I can use labels to "namespace" my nodes, or in other words as a way to identify subgraphs. If I have a node with 2 labels instead, say label1 and label2, I can create 2 indices for the same node, right? That is an index for label1.age (Integer) and an index for label2.age (String), both indices containing the same node. In this scenario I should be allowed to add 2 types of properties to the same node, one containing an Integer and the other one containing a String. Then query by choosing a specific label. Does this work? Can I do something like this? Sent: Monday, December 14, 2020 at 8:01 AM From: "HadoopMarc" <b...@...> To: "JanusGraph users" <janu...@...> Subject: Re: Property with multiple data types Hi Laura, Good that you pay close attention to understanding indices in JanusGraph because they are essential to proper use. Does the following section of the ref docs answers your question? https://docs.janusgraph.org/index-management/index-performance/#label-constraint[ https://docs.janusgraph.org/index-management/index-performance/#label-constraint] Best wishes, Marc Op zondag 13 december 2020 om 16:30:19 UTC+1 schreef Laura Morales:I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string? -- You received this message because you are subscribed to the Google Groups "JanusGraph users" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@...[mailto:janusgr...@...]. To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com[ https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer][ https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com%5Bhttps://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer]]. -- You received this message because you are subscribed to the Google Groups "JanusGraph users" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@...[mailto:janusgra...@...]. To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/24ee1b44-3501-4d40-abef-b32aa345c959n%40googlegroups.com[ https://groups.google.com/d/msgid/janusgraph-users/24ee1b44-3501-4d40-abef-b32aa345c959n%40googlegroups.com?utm_medium=email&utm_source=footer].
|
|
Re: Centric Indexes failing to support all conditions for better performance.
Hi Christopher,
I don't have any workaround in mind except testing and comparing query latencies.
I have created https://github.com/JanusGraph/janusgraph/issues/2283 which hopefully can be addressed before the next release. That being said, there is no planned date for the next release yet.
Btw as I mentioned earlier, if you use "hasNot" it almost never leverages index - no matter if it's a mixed or composite or vertex-centric index.
Best regards, Boxuan
chrism在 2020年12月14日星期一上午11:56:57 [UTC+8]寫道:
toggle quoted message
Show quoted text
Thank you Boxuan Li,
It is obvious that your are an expert, is any other way apart of isFitted=true to know that index is used or not? (It may be even debugging JanusGraph server or Cassandra)
We need to construct Gremlin query, to utilize these indexes in full, and always,... problem is just what to type, as our implementation requires more complicated than above conditions to match, using above as sample it would be: (rating >= value AND time < value) OR HasNot( time ) - means that "time" was not specified. What is visible from profile() is that we cannot use coalesce() or or() steps, and trying all kind of workarounds cannot be verified easily having isFitted=false and no other "good" indication of using indexes.
Cheers, Christopher
On Sunday, December 13, 2020 at 7:24:13 PM UTC+11 li...@... wrote: Hi Christopher,
isFitted = true basically means no in-memory filtering is needed. If you see isFitted = false, it does not necessarily mean vertex-centric indexes are not used. It could be the case that some vertex-centric index is used, but further in-memory filtering is still needed. If you see isFitted = false, it does not necessarily mean any index is used. It could be the case that you are fetching all edges of a given vertex.
I totally understand your confusion because the documentation does not explain how the vertex-centric index is built. In JanusGraph, vertices and edges are stored in the “edgestore” store, while composite indexes are stored in the “graphindex” store. Mixed indexes
Roughly speaking, If you don’t have any vertex-centric index, then your edge is stored once for one endpoint. If you have one vertex-centric index, then applicable edges are stored twice. If you have two vertex-centric indexes, then applicable edges are stored three times… These edges, although seemingly duplicate, have different “sort key”s which conform to corresponding vertex-centric indexes. Let’s say you have built an “battlesByRating” vertex-centric index based on the property “rating”, then apart from the ordinary edge, JanusGraph creates an additional edge whose “sort key” is the rating value. Because the “column” is sorted in the underlying data storage (e.g. “column” in JanusGraph model is mapped to “clustering column” in Cassandra), you essentially gain the ability to search an index by “rating” value/range.
What happens when your vertex-centric index has two properties like the following?
> mgmt.buildEdgeIndex(battled, 'battlesByRatingAndTime', Direction.OUT, Order.asc, rating, time)
Now your “sort key” is a combination of “rating” and “time” (note “rating” comes before “time”). Under this vertex-centric index, “sort key”s look like this:
(rating=1, time=2), (rating=1, time=3), (rating=2, time=1), (rating=2, time=5), (rating=4, time=2), …
This explains why isFitted = true when your query is has('rating', 5.0).has('time', inside(10, 50)) but not when your query is has(’time', 5.0).has(‘rating', inside(10, 50)).Again, note that isFitted = false does not necessarily mean your query is not optimized by vertex-centric index. I think the profiler shall be improved to state whether and which vertex-centric index is used.
I am not quite sure about the case b) you mentioned. Seems it’s a design consideration but right now I cannot tell why it is there.
“hasNot" almost never uses indexes because JanusGraph cannot index something that does not exist. (Note that “null” value is not valid in JanusGraph).
Hope this helps.
Best regards, Boxuan On Dec 10, 2020, at 11:01 AM, chrism < cm...@...> wrote:
is describing usage of Vertex Centrix Index [edge=battled + properties=(rating,time)] g.V(h).outE('battled').has('rating', 5.0).has('time', inside(10, 50)).inV()
From my understanding profile() of above is reporting \_isFitted=true to indicate that backend-query delivered all results as conditions: \_condition=(rating = 0.5 AND time > 10 AND time < 50 AND type[battled])
Two things are obvious from above: centric index is supporting multiple property keys, and equality and range/interval constraints. However isFitted is false for all kind of conditions or combinations which are not really breaking the above rules, still in range constraints:
a) g.V(h).outE('battled').has('rating',lt(5.0)).has('time', inside(10, 50)).inV() // P.lt used for first key b) g.V(h).outE('battled').has('rating',gt(5.0)) // P.gt used c) g.V(h).outE('battled').or( hasNot('rating'), has('rating',eq(5.0)) ) // OrStep() used
Even b) can be "fitted" by has('rating',inside(5.0,Long.MAX_VALUE)) all that is very confusing, and probably not working as expected, what I am doing wrong? as from my experience only one property key can be used for query conditions and using index, the second is ignored.
Having isFitted=false is not really improving performance, from my understanding, when one only condition allows to get most of my edges and is asking to filter them in memory, as this is stated by implementation of BasicVertexCentricQueryBuilder.java. Are there limitations not described in the JG doco? It is a glitch?
Can you offer explanation how to utilize Centric Indexes for edges in full support?
Christopher
|
|
Re: Property with multiple data types
Laura Morales <laur...@...>
Thank you for your further comments. I'm still a bit confused though. Janus advertises itself as a database for huge graphs. But I'm asking myself if it means huge "homogeneous" graphs (ie. a simple schema but a lot of nodes/edges) or huge "dishomogeneous" graphs (ie. with lots of nodes/edges but also with a complex schema). With really big graphs I think it's reasonable to assume that different nodes will want to use the same property key but with different data types. Labels however don't really help with this use case, because there can only be 1 per node, and maybe some nodes want to use both types. How would you design the schema of a big "dishomogeneous" graph? Like a schema that can describe 100s of different domains in a single graph. Is Object.class the only way? Sent: Monday, December 14, 2020 at 2:29 PM From: "HadoopMarc" <bi...@...> To: "JanusGraph users" <janusgra...@...> Subject: Re: Property with multiple data types Hi Laura, Things are a bit different than you ask: a vertex has a single label onlya property key has a single datatype only, but it can be Object.class, see https://docs.janusgraph.org/basics/schema/#property-key-data-typeindices can have a label constraint, but these are not helpful if you want to mix the datatypes in a property for the same vertexI cannot predict well how the various janusgraph parts will behave when mixing up real integers and "string-integers" in a property key of the Object.class datatype. I guess that the gremlin traversals will have problems, while an indexing backend for MixedIndices probably can deal with it. The ref docs definitely advise to use the basic datatypes and spare yourself future headaches (so, unify the datatypes on ingestion). Best wishes, Marc Op maandag 14 december 2020 om 09:07:31 UTC+1 schreef Laura Morales: Thank you Marc, I think it does indeed! If I understand correctly, I can use labels to "namespace" my nodes, or in other words as a way to identify subgraphs. If I have a node with 2 labels instead, say label1 and label2, I can create 2 indices for the same node, right? That is an index for label1.age (Integer) and an index for label2.age (String), both indices containing the same node. In this scenario I should be allowed to add 2 types of properties to the same node, one containing an Integer and the other one containing a String. Then query by choosing a specific label. Does this work? Can I do something like this? Sent: Monday, December 14, 2020 at 8:01 AM From: "HadoopMarc" <b...@...> To: "JanusGraph users" <janu...@...> Subject: Re: Property with multiple data types Hi Laura, Good that you pay close attention to understanding indices in JanusGraph because they are essential to proper use. Does the following section of the ref docs answers your question? https://docs.janusgraph.org/index-management/index-performance/#label-constraint[ https://docs.janusgraph.org/index-management/index-performance/#label-constraint] Best wishes, Marc Op zondag 13 december 2020 om 16:30:19 UTC+1 schreef Laura Morales:I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string? -- You received this message because you are subscribed to the Google Groups "JanusGraph users" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@...[mailto:janusgr...@...]. To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com[ https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer][ https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com%5Bhttps://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer]]. -- You received this message because you are subscribed to the Google Groups "JanusGraph users" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@...[mailto:janusgra...@...]. To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/24ee1b44-3501-4d40-abef-b32aa345c959n%40googlegroups.com[ https://groups.google.com/d/msgid/janusgraph-users/24ee1b44-3501-4d40-abef-b32aa345c959n%40googlegroups.com?utm_medium=email&utm_source=footer].
|
|
Re: Property with multiple data types
Hi Laura,
Things are a bit different than you ask: - a vertex has a single label only
- a property key has a single datatype only, but it can be Object.class, see https://docs.janusgraph.org/basics/schema/#property-key-data-type
- indices can have a label constraint, but these are not helpful if you want to mix the datatypes in a property for the same vertex
- I cannot predict well how the various janusgraph parts will behave when mixing up real integers and "string-integers" in a property key of the Object.class datatype. I guess that the gremlin traversals will have problems, while an indexing backend for MixedIndices probably can deal with it. The ref docs definitely advise to use the basic datatypes and spare yourself future headaches (so, unify the datatypes on ingestion).
Best wishes, Marc
Op maandag 14 december 2020 om 09:07:31 UTC+1 schreef Laura Morales:
toggle quoted message
Show quoted text
Thank you Marc, I think it does indeed! If I understand correctly, I can use labels to "namespace" my nodes, or in other words as a way to identify subgraphs.
If I have a node with 2 labels instead, say label1 and label2, I can create 2 indices for the same node, right? That is an index for label1.age (Integer) and an index for label2.age (String), both indices containing the same node. In this scenario I should be allowed to add 2 types of properties to the same node, one containing an Integer and the other one containing a String. Then query by choosing a specific label. Does this work? Can I do something like this?
Sent: Monday, December 14, 2020 at 8:01 AM
From: "HadoopMarc" <b...@...>
To: "JanusGraph users" <janu...@...>
Subject: Re: Property with multiple data types
Hi Laura,
Good that you pay close attention to understanding indices in JanusGraph because they are essential to proper use. Does the following section of the ref docs answers your question?
https://docs.janusgraph.org/index-management/index-performance/#label-constraint
Best wishes, Marc
Op zondag 13 december 2020 om 16:30:19 UTC+1 schreef Laura Morales:I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string?
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgr...@...[mailto:janusgr...@...].
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com[https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer].
|
|
Re: Is there a standard, human-friendly, serialization format?
Evgeniy Ignatiev <yevgeniy...@...>
toggle quoted message
Show quoted text
On 14.12.2020 09:22, Laura Morales wrote: All the examples that I see in the Janus documentation seem to use Groovy. Instructions such as JanusGraphFactory.open(), graph.openManagement(), mgmt.makeEdgeLabel() etc. Is there any human-friendly plaintext format that I can use to write my graph with, and then load into Janus? In practical terms what I would like to do is this:
1. write my graph in a text file, all nodes and edges, and the schema too. So the format should be human-friendly and easy to edit manually. Hopefully not XML. 2. load this graph into Janus by asking Janus to read my graph file. Not in-memory though, I mean to create a new persistent database that is always there when Janus starts.
|
|
Is there a standard, human-friendly, serialization format?
Laura Morales <laur...@...>
All the examples that I see in the Janus documentation seem to use Groovy. Instructions such as JanusGraphFactory.open(), graph.openManagement(), mgmt.makeEdgeLabel() etc. Is there any human-friendly plaintext format that I can use to write my graph with, and then load into Janus? In practical terms what I would like to do is this:
1. write my graph in a text file, all nodes and edges, and the schema too. So the format should be human-friendly and easy to edit manually. Hopefully not XML. 2. load this graph into Janus by asking Janus to read my graph file. Not in-memory though, I mean to create a new persistent database that is always there when Janus starts.
|
|
Re: Property with multiple data types
Laura Morales <laur...@...>
Thank you Marc, I think it does indeed! If I understand correctly, I can use labels to "namespace" my nodes, or in other words as a way to identify subgraphs. If I have a node with 2 labels instead, say label1 and label2, I can create 2 indices for the same node, right? That is an index for label1.age (Integer) and an index for label2.age (String), both indices containing the same node. In this scenario I should be allowed to add 2 types of properties to the same node, one containing an Integer and the other one containing a String. Then query by choosing a specific label. Does this work? Can I do something like this? Sent: Monday, December 14, 2020 at 8:01 AM From: "HadoopMarc" <bi...@...> To: "JanusGraph users" <janusgra...@...> Subject: Re: Property with multiple data types Hi Laura, Good that you pay close attention to understanding indices in JanusGraph because they are essential to proper use. Does the following section of the ref docs answers your question? https://docs.janusgraph.org/index-management/index-performance/#label-constraint Best wishes, Marc Op zondag 13 december 2020 om 16:30:19 UTC+1 schreef Laura Morales:I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string? -- You received this message because you are subscribed to the Google Groups "JanusGraph users" group. To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@...[mailto:janusgra...@...]. To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com[ https://groups.google.com/d/msgid/janusgraph-users/0b84be68-3688-46fe-a104-32baef119e2an%40googlegroups.com?utm_medium=email&utm_source=footer].
|
|
Re: Property with multiple data types
Hi Laura,
Good that you pay close attention to understanding indices in JanusGraph because they are essential to proper use. Does the following section of the ref docs answers your question?
https://docs.janusgraph.org/index-management/index-performance/#label-constraint
Best wishes, Marc
Op zondag 13 december 2020 om 16:30:19 UTC+1 schreef Laura Morales:
toggle quoted message
Show quoted text
I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string?
|
|
Re: Centric Indexes failing to support all conditions for better performance.
Thank you Boxuan Li, It is obvious that your are an expert, is any other way apart of isFitted=true to know that index is used or not? (It may be even debugging JanusGraph server or Cassandra)
We need to construct Gremlin query, to utilize these indexes in full, and always,... problem is just what to type, as our implementation requires more complicated than above conditions to match, using above as sample it would be: (rating >= value AND time < value) OR HasNot( time ) - means that "time" was not specified. What is visible from profile() is that we cannot use coalesce() or or() steps, and trying all kind of workarounds cannot be verified easily having isFitted=false and no other "good" indication of using indexes.
Cheers, Christopher
toggle quoted message
Show quoted text
On Sunday, December 13, 2020 at 7:24:13 PM UTC+11 li...@... wrote:
Hi Christopher,
isFitted = true basically means no in-memory filtering is needed. If you see isFitted = false, it does not necessarily mean vertex-centric indexes are not used. It could be the case that some vertex-centric index is used, but further in-memory filtering is still needed. If you see isFitted = false, it does not necessarily mean any index is used. It could be the case that you are fetching all edges of a given vertex.
I totally understand your confusion because the documentation does not explain how the vertex-centric index is built. In JanusGraph, vertices and edges are stored in the “edgestore” store, while composite indexes are stored in the “graphindex” store. Mixed indexes
Roughly speaking, If you don’t have any vertex-centric index, then your edge is stored once for one endpoint. If you have one vertex-centric index, then applicable edges are stored twice. If you have two vertex-centric indexes, then applicable edges are stored three times… These edges, although seemingly duplicate, have different “sort key”s which conform to corresponding vertex-centric indexes. Let’s say you have built an “battlesByRating” vertex-centric index based on the property “rating”, then apart from the ordinary edge, JanusGraph creates an additional edge whose “sort key” is the rating value. Because the “column” is sorted in the underlying data storage (e.g. “column” in JanusGraph model is mapped to “clustering column” in Cassandra), you essentially gain the ability to search an index by “rating” value/range.
What happens when your vertex-centric index has two properties like the following?
> mgmt.buildEdgeIndex(battled, 'battlesByRatingAndTime', Direction.OUT, Order.asc, rating, time)
Now your “sort key” is a combination of “rating” and “time” (note “rating” comes before “time”). Under this vertex-centric index, “sort key”s look like this:
(rating=1, time=2), (rating=1, time=3), (rating=2, time=1), (rating=2, time=5), (rating=4, time=2), …
This explains why isFitted = true when your query is has('rating', 5.0).has('time', inside(10, 50)) but not when your query is has(’time', 5.0).has(‘rating', inside(10, 50)).Again, note that isFitted = false does not necessarily mean your query is not optimized by vertex-centric index. I think the profiler shall be improved to state whether and which vertex-centric index is used.
I am not quite sure about the case b) you mentioned. Seems it’s a design consideration but right now I cannot tell why it is there.
“hasNot" almost never uses indexes because JanusGraph cannot index something that does not exist. (Note that “null” value is not valid in JanusGraph).
Hope this helps.
Best regards, Boxuan On Dec 10, 2020, at 11:01 AM, chrism < cm...@...> wrote:
is describing usage of Vertex Centrix Index [edge=battled + properties=(rating,time)] g.V(h).outE('battled').has('rating', 5.0).has('time', inside(10, 50)).inV()
From my understanding profile() of above is reporting \_isFitted=true to indicate that backend-query delivered all results as conditions: \_condition=(rating = 0.5 AND time > 10 AND time < 50 AND type[battled])
Two things are obvious from above: centric index is supporting multiple property keys, and equality and range/interval constraints. However isFitted is false for all kind of conditions or combinations which are not really breaking the above rules, still in range constraints:
a) g.V(h).outE('battled').has('rating',lt(5.0)).has('time', inside(10, 50)).inV() // P.lt used for first key b) g.V(h).outE('battled').has('rating',gt(5.0)) // P.gt used c) g.V(h).outE('battled').or( hasNot('rating'), has('rating',eq(5.0)) ) // OrStep() used
Even b) can be "fitted" by has('rating',inside(5.0,Long.MAX_VALUE)) all that is very confusing, and probably not working as expected, what I am doing wrong? as from my experience only one property key can be used for query conditions and using index, the second is ignored.
Having isFitted=false is not really improving performance, from my understanding, when one only condition allows to get most of my edges and is asking to filter them in memory, as this is stated by implementation of BasicVertexCentricQueryBuilder.java. Are there limitations not described in the JG doco? It is a glitch?
Can you offer explanation how to utilize Centric Indexes for edges in full support?
Christopher
|
|
Property with multiple data types
Laura Morales <laur...@...>
I'm new to Janus and LPGs. I have a question after reading the Janus documentation. As far as I understand, edges labels as well as properties (for both nodes and edges) are indexed globally. What happens when I have a sufficiently large graph, that completely unrelated and separate nodes want to use a property called with the same name but that holds different data types? For example, a property called "age" could be used by some nodes with a Integer type (eg. "age": 23), but other nodes on the other far-side of my big graph might want/need/require to use a String type (eg. "age": "twenty-seven"). Is this configuration possible with Janus? Or do I *have to* use two different names such as age_int and age_string?
|
|
Laura Morales <laur...@...>
|
|
Re: Centric Indexes failing to support all conditions for better performance.
Hi Christopher,
isFitted = true basically means no in-memory filtering is needed. If you see isFitted = false, it does not necessarily mean vertex-centric indexes are not used. It could be the case that some vertex-centric index is used, but further in-memory filtering is still needed. If you see isFitted = false, it does not necessarily mean any index is used. It could be the case that you are fetching all edges of a given vertex.
I totally understand your confusion because the documentation does not explain how the vertex-centric index is built. In JanusGraph, vertices and edges are stored in the “edgestore” store, while composite indexes are stored in the “graphindex” store. Mixed indexes
Roughly speaking, If you don’t have any vertex-centric index, then your edge is stored once for one endpoint. If you have one vertex-centric index, then applicable edges are stored twice. If you have two vertex-centric indexes, then applicable edges are stored three times… These edges, although seemingly duplicate, have different “sort key”s which conform to corresponding vertex-centric indexes. Let’s say you have built an “battlesByRating” vertex-centric index based on the property “rating”, then apart from the ordinary edge, JanusGraph creates an additional edge whose “sort key” is the rating value. Because the “column” is sorted in the underlying data storage (e.g. “column” in JanusGraph model is mapped to “clustering column” in Cassandra), you essentially gain the ability to search an index by “rating” value/range.
What happens when your vertex-centric index has two properties like the following?
> mgmt.buildEdgeIndex(battled, 'battlesByRatingAndTime', Direction.OUT, Order.asc, rating, time)
Now your “sort key” is a combination of “rating” and “time” (note “rating” comes before “time”). Under this vertex-centric index, “sort key”s look like this:
(rating=1, time=2), (rating=1, time=3), (rating=2, time=1), (rating=2, time=5), (rating=4, time=2), …
This explains why isFitted = true when your query is has('rating', 5.0).has('time', inside(10, 50)) but not when your query is has(’time', 5.0).has(‘rating', inside(10, 50)).Again, note that isFitted = false does not necessarily mean your query is not optimized by vertex-centric index. I think the profiler shall be improved to state whether and which vertex-centric index is used.
I am not quite sure about the case b) you mentioned. Seems it’s a design consideration but right now I cannot tell why it is there.
“hasNot" almost never uses indexes because JanusGraph cannot index something that does not exist. (Note that “null” value is not valid in JanusGraph).
Hope this helps.
Best regards, Boxuan
toggle quoted message
Show quoted text
On Dec 10, 2020, at 11:01 AM, chrism < cmil...@...> wrote:
is describing usage of Vertex Centrix Index [edge=battled + properties=(rating,time)] g.V(h).outE('battled').has('rating', 5.0).has('time', inside(10, 50)).inV()
From my understanding profile() of above is reporting \_isFitted=true to indicate that backend-query delivered all results as conditions: \_condition=(rating = 0.5 AND time > 10 AND time < 50 AND type[battled])
Two things are obvious from above: centric index is supporting multiple property keys, and equality and range/interval constraints. However isFitted is false for all kind of conditions or combinations which are not really breaking the above rules, still in range constraints:
a) g.V(h).outE('battled').has('rating',lt(5.0)).has('time', inside(10, 50)).inV() // P.lt used for first key b) g.V(h).outE('battled').has('rating',gt(5.0)) // P.gt used c) g.V(h).outE('battled').or( hasNot('rating'), has('rating',eq(5.0)) ) // OrStep() used
Even b) can be "fitted" by has('rating',inside(5.0,Long.MAX_VALUE)) all that is very confusing, and probably not working as expected, what I am doing wrong? as from my experience only one property key can be used for query conditions and using index, the second is ignored.
Having isFitted=false is not really improving performance, from my understanding, when one only condition allows to get most of my edges and is asking to filter them in memory, as this is stated by implementation of BasicVertexCentricQueryBuilder.java. Are there limitations not described in the JG doco? It is a glitch?
Can you offer explanation how to utilize Centric Indexes for edges in full support?
Christopher
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@....
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/f8fb537e-216a-462d-928b-ac906eb707a3n%40googlegroups.com.
|
|
Re: Configuring Transaction Log feature
Sandeep Mishra <sandy...@...>
Pawan, I was able to make your code work. the problem is "setStartTimeNow()" Instead use
setStartTime(Instant.now()) and test. It works. I am yet to explore difference between two api. make sure to use a new logidentifier to test.
Regards, Sandeep
toggle quoted message
Show quoted text
On Wednesday, December 9, 2020 at 8:54:17 PM UTC+8 shr...@... wrote:
Hi Sandeep,
I think I have already added below line to indicate that it should pull the detail from now onwords in processor. Is it not working?
"setStartTimeNow()"
Is anyone other face the same thing in their java code?
Thanks, Pawan
On Friday, 4 December 2020 at 16:22:51 UTC+5:30 sa...@... wrote: pawan,can you check for following in your logs Loaded unidentified ReadMarker start time... seems your readmarker is starting from 1970. so it tries to read changes since then
Regards, Sandeep On Saturday, November 28, 2020 at 8:48:18 PM UTC+8 shr...@... wrote: one correction to last post in below line.
JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("TestLog").start();
On Saturday, 28 November 2020 at 18:16:09 UTC+5:30 Pawan Shriwas wrote:
Hi Sandeep,
Please see below java code and properties information which I am trying in local with Cassandra cql as backend. This code is not giving me the change log as event which I can get via gremlin console with same script and properties. Please let me know if anything needs to be modify here with code or properties.
<!-- Java Code --> package com.example.graph;
import org.janusgraph.core.JanusGraph; import org.janusgraph.core.JanusGraphFactory; import org.janusgraph.core.JanusGraphTransaction; import org.janusgraph.core.JanusGraphVertex; import org.janusgraph.core.log.ChangeProcessor; import org.janusgraph.core.log.ChangeState; import org.janusgraph.core.log.LogProcessorFramework; import org.janusgraph.core.log.TransactionId;
public class TestLog { public static void listenLogsEvent(){ JanusGraph graph = JanusGraphFactory.open("/home/ist/Downloads/IM/jgraphdb_local.properties"); LogProcessorFramework logProcessor = JanusGraphFactory.openTransactionLog(graph);
logProcessor.addLogProcessor("TestLog"). setProcessorIdentifier("TestLogCounter"). setStartTimeNow(). addProcessor(new ChangeProcessor(){ @Override public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { System.out.println("tx--"+tx.toString()); System.out.println("txId--"+txId.toString()); System.out.println("changeState--"+changeState.toString()); } }). build(); for(int i=0;i<=10;i++) { System.out.println("going to add ="+i); JanusGraphTransaction tx = graph.buildTransaction().logIdentifier("PawanTestLog").start(); JanusGraphVertex a = tx.addVertex("TimeL"); a.property("type", "HOLD"); a.property("serialNo", "XS31B4"); tx.commit(); System.out.println("Vertex committed ="+a.toString()); } } public static void main(String[] args) { System.out.println("starting main"); listenLogsEvent(); } }
<!----- graph properties-------> gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend = cql storage.hostname = localhost storage.cql.keyspace=janusgraphcql query.fast-property = true storage.lock.wait-time=10000 storage.batch-loading=true
Thanks in advance.
Thanks, Pawan
On Saturday, 28 November 2020 at 16:19:20 UTC+5:30 sa...@... wrote: Pawan, Can you elaborate more on the program where your are trying to embed the script in? Regards, Sandeep
On Sat, 28 Nov 2020, 13:48 Pawan Shriwas, < shr...@...> wrote: Hey Jason,
Same thing happen with my as well where above script work well in gremlin console but when we use it in java. we are not getting anything in process() section as callback. Could you help for the same.
On Wednesday, 7 February 2018 at 20:28:41 UTC+5:30 Jason Plurad wrote:
It means that it will use the 'storage.backend' value as the storage. See the code in GraphDatabaseConfiguration.java. It looks like your only choice is 'default', and it seems like the option is there for the future possibility to use a different backend. The code in the docs seemed to work ok, other than a minor change in the setStartTime() parameters. You can cut and paste this code into the Gremlin Console to use with the prepackaged distribution. import java.util.concurrent.atomic.*; import org.janusgraph.core.log.*; import java.util.concurrent.*;
graph = JanusGraphFactory.open('conf/janusgraph-cassandra-es.properties');
totalHumansAdded = new AtomicInteger(0); totalGodsAdded = new AtomicInteger(0); logProcessor = JanusGraphFactory.openTransactionLog(graph); logProcessor.addLogProcessor("addedPerson"). setProcessorIdentifier("addedPersonCounter"). setStartTime(Instant.now()). addProcessor(new ChangeProcessor() { public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { for (v in changeState.getVertices(Change.ADDED)) { if (v.label().equals("human")) totalHumansAdded.incrementAndGet(); System.out.println("total humans = " + totalHumansAdded); } } }). addProcessor(new ChangeProcessor() { public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { for (v in changeState.getVertices(Change.ADDED)) { if (v.label().equals("god")) totalGodsAdded.incrementAndGet(); System.out.println("total gods = " + totalGodsAdded); } } }). build()
tx = graph.buildTransaction().logIdentifier("addedPerson").start(); u = tx.addVertex(T.label, "human"); u.property("name", "proteros"); u.property("age", 36); tx.commit();
If you inspect the keyspace in Cassandra afterwards, you'll see that a separate table is created for "ulog_addedPerson". Did you have some example code of what you are attempting? On Wednesday, February 7, 2018 at 5:55:58 AM UTC-5, Sandeep Mishra wrote: Hi Guys,
We are trying to used transaction log feature of Janusgraph, which is not working as expected.No callback is received at public void process(JanusGraphTransaction janusGraphTransaction, TransactionId transactionId, ChangeState changeState) {
Janusgraph documentation says value for log.[X].backend is 'default'. Not sure what exactly it means. does it mean HBase which is being used as backend for data.
Please let me know, if anyone has configured it.
Thanks and Regards, Sandeep Mishra
|
|