Graph corruption?
Joe Obernberger
Hi all - I'm seeing this from a recent graph I built:
gremlin> :> g.V(4162).valueMap() ==>{source=[DS_106], sourceName=[GDELTRecord3]} gremlin> :> g.V(4146).valueMap() ==>{source=[DS_106], sourceName=[GDELTRecord3]} gremlin> :> g.V(4226).valueMap() ==>{source=[DS_106], sourceName=[GDELTRecord3]} gremlin> :> g.V(4250).valueMap() ==>{source=[DS_106], sourceName=[GDELTRecord3]} gremlin> gremlin> gremlin> :> g.V().has("source","DS_106") ==>v[4226] The graph has an index on source like this: PropertyKey sourceProperty = mgmt.makePropertyKey("source").dataType(String.class).cardinality(Cardinality.SINGLE).make(); JanusGraphIndex sourceIndex = mgmt.buildIndex("bySourceComposite", Vertex.class).addKey(sourceProperty).unique().buildCompositeIndex(); mgmt.setConsistency(sourceProperty, ConsistencyModifier.LOCK); mgmt.setConsistency(sourceIndex, ConsistencyModifier.LOCK); How could the graph end up with several vertices with the same source string? Still learning graphs... Thank you! -Joe -- This email has been checked for viruses by AVG. https://www.avg.com |
|
Kevin Schmidt
Joe, See https://groups.google.com/g/janusgraph-users/c/foaqfG-MB5E/m/tsNnkhPtBwAJ for a discussion on perhaps the same issue you are running into. Long story short, it appears you have a unique index, but if you are using locking you can end up with duplicates, but using locking slows things down. If possible, reorganize things or your traversals so you don't rely on the graph to enforce the unique index. Kevin On Mon, Jun 27, 2022 at 1:44 PM Joe Obernberger <joseph.obernberger@...> wrote: Hi all - I'm seeing this from a recent graph I built: |
|
Boxuan Li
Hi Joe,
I just wanted to check a few things: 1. Did you happen to enable `storage.batch-loading`? See https://docs.janusgraph.org/operations/bulk-loading/#batch-loading 2. IIRC you are using Cassandra. Did you happen to change storage.cql.read-consistency-level + storage.cql.read-consistency-level? I wouldn't be too surprised even if the answers to both questions are "no", because the locking approach heavily relies on the storage backend and Cassandra (at the moment) only offers eventual consistency. Best, Boxuan |
|
Joe Obernberger
Thank you Kevin and Boxuan for the help on this. I was
scratching my head on this and decided to blow away the graph and
try again. Every-time, I built a small graph and exported to
graphML for viewing in Gephi I would have node IDs that only
existed in the edges list. Once I removed partition, the "zombie" node IDs disappeared. I wanted to use partition for that since those particular vertexes can have a lot of edges; potentially billions. [https://docs.janusgraph.org/advanced-topics/partitioning/] Perhaps I'm not using that correctly, or is this a bug? -Joe On 6/27/2022 9:40 PM, Boxuan Li wrote:
Hi Joe, |
|