Graph corruption?


Joe Obernberger
 

Thank you Kevin and Boxuan for the help on this.  I was scratching my head on this and decided to blow away the graph and try again.  Every-time, I built a small graph and exported to graphML for viewing in Gephi I would have node IDs that only existed in the edges list.
I printed the nodeID in my code everywhere it was used, and I would never see it in the output, but the graphML had it in the edges list and those 'zombie nodes' did exist in the graph as confirmed by gremlin queries. This was happening because I was using:
VertexLabel sourceLabel = mgmt.makeVertexLabel("source").partition().make();

Once I removed partition, the "zombie" node IDs disappeared.  I wanted to use partition for that since those particular vertexes can have a lot of edges; potentially billions.  [https://docs.janusgraph.org/advanced-topics/partitioning/]

Perhaps I'm not using that correctly, or is this a bug?

-Joe

On 6/27/2022 9:40 PM, Boxuan Li wrote:
Hi Joe,

I just wanted to check a few things:

1. Did you happen to enable `storage.batch-loading`? See https://docs.janusgraph.org/operations/bulk-loading/#batch-loading

2. IIRC you are using Cassandra. Did you happen to change storage.cql.read-consistency-level + storage.cql.read-consistency-level?

I wouldn't be too surprised even if the answers to both questions are "no", because the locking approach heavily relies on the storage backend and Cassandra (at the moment) only offers eventual consistency.

Best,
Boxuan





AVG logo

This email has been checked for viruses by AVG antivirus software.
www.avg.com



Boxuan Li
 

Hi Joe,

I just wanted to check a few things:

1. Did you happen to enable `storage.batch-loading`? See https://docs.janusgraph.org/operations/bulk-loading/#batch-loading

2. IIRC you are using Cassandra. Did you happen to change storage.cql.read-consistency-level + storage.cql.read-consistency-level?

I wouldn't be too surprised even if the answers to both questions are "no", because the locking approach heavily relies on the storage backend and Cassandra (at the moment) only offers eventual consistency.

Best,
Boxuan



Kevin Schmidt
 

Joe,

See https://groups.google.com/g/janusgraph-users/c/foaqfG-MB5E/m/tsNnkhPtBwAJ for a discussion on perhaps the same issue you are running into.

Long story short, it appears you have a unique index, but if you are using locking you can end up with duplicates, but using locking slows things down.  If possible, reorganize things or your traversals so you don't rely on the graph to enforce the unique index.

Kevin

On Mon, Jun 27, 2022 at 1:44 PM Joe Obernberger <joseph.obernberger@...> wrote:
Hi all - I'm seeing this from a recent graph I built:

gremlin> :> g.V(4162).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4146).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4226).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4250).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin>
gremlin>
gremlin> :> g.V().has("source","DS_106")
==>v[4226]

The graph has an index on source like this:

PropertyKey sourceProperty =
mgmt.makePropertyKey("source").dataType(String.class).cardinality(Cardinality.SINGLE).make();
JanusGraphIndex sourceIndex = mgmt.buildIndex("bySourceComposite",
Vertex.class).addKey(sourceProperty).unique().buildCompositeIndex();
mgmt.setConsistency(sourceProperty, ConsistencyModifier.LOCK);
mgmt.setConsistency(sourceIndex, ConsistencyModifier.LOCK);

How could the graph end up with several vertices with the same source
string?  Still learning graphs...

Thank you!

-Joe


--
This email has been checked for viruses by AVG.
https://www.avg.com







Joe Obernberger
 

Hi all - I'm seeing this from a recent graph I built:

gremlin> :> g.V(4162).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4146).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4226).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin> :> g.V(4250).valueMap()
==>{source=[DS_106], sourceName=[GDELTRecord3]}
gremlin>
gremlin>
gremlin> :> g.V().has("source","DS_106")
==>v[4226]

The graph has an index on source like this:

PropertyKey sourceProperty = mgmt.makePropertyKey("source").dataType(String.class).cardinality(Cardinality.SINGLE).make();
JanusGraphIndex sourceIndex = mgmt.buildIndex("bySourceComposite", Vertex.class).addKey(sourceProperty).unique().buildCompositeIndex();
mgmt.setConsistency(sourceProperty, ConsistencyModifier.LOCK);
mgmt.setConsistency(sourceIndex, ConsistencyModifier.LOCK);

How could the graph end up with several vertices with the same source string?  Still learning graphs...

Thank you!

-Joe


--
This email has been checked for viruses by AVG.
https://www.avg.com