The use of vertex-centric index


张雅南 <18834...@...>
 

Hi JanusGraph team,

I have created a vertex-centric indexes for vertices. As follows, now I want to use the index to get the information of the top 500 edges in descending sort. However, I find that the execution time is the same as that without vertex index. How can I use the index to sort faster and extract the information of the first 500 edges more quickly?

Here's the graph I've built:

graph=JanusGraphFactory.open(‘janusgraph-cql-es-server-test2.properties’) mgmt = graph.openManagement() mgmt.makeVertexLabel('VirtualAddress').make() addr = mgmt.makePropertyKey('address').dataType(String.class).cardinality(SINGLE).make() token_addr = mgmt.makePropertyKey('token_addr').dataType(String.class).cardinality(SINGLE).make() transfer_to=mgmt.makeEdgeLabel('TRANSFER_TO').multiplicity(MULTI).make() amount = mgmt.makePropertyKey('amount').dataType(Double.class).cardinality(SINGLE).make() tx_hash = mgmt.makePropertyKey('tx_hash').dataType(String.class).cardinality(SINGLE).make() tx_index = mgmt.makePropertyKey('tx_index').dataType(Integer.class).cardinality(SINGLE).make() created_time = mgmt.makePropertyKey('created_time').dataType(Date.class).cardinality(SINGLE).make() updated_time = mgmt.makePropertyKey('updated_time').dataType(Date.class).cardinality(SINGLE).make() mgmt.buildIndex('addressComposite', Vertex.class).addKey(addr).buildCompositeIndex() mgmt.buildIndex('addressTokenUniqComposite', Vertex.class).addKey(addr).addKey(token_addr).unique().buildCompositeIndex() mgmt.buildEdgeIndex(transfer_to,"transferOutAmountTs", Direction.OUT, Order.desc,amount,created_time) mgmt.buildEdgeIndex(transfer_to,"transferOutTs", Direction.OUT, Order.desc,created_time) mgmt.commit()

Here's the data I inserted, building a starting point, a million edges associated with it, and 100 endpoints,

graph_conf = 'janusgraph-cql-es-server-test2.properties'
graph = JanusGraphFactory.open(graph_conf)
g = graph.traversal()
String line = "5244613,tx_hash_00,token_addr_00,from_addr_00,to_addr_00,,,6000,19305.57174337591,72,1520896044"
int start_value = 1
int end_value = 1000000

line = "tx_hash_00,token_addr_00,from_addr_00,to_addr_00,6000,72,1520896044"
cloumns = line.split(',', -1)
(tx_hash, token_addr, from_addr, to_addr, amount, log_index, timestamp) = cloumns
from_addr_node = g.addV('VirtualAddress').property('address', from_addr).property('token_addr', token_addr).next()
from_id = from_addr_node.id()
amount = amount.toBigDecimal()
tx_index = log_index.toInteger()
for (int i = start_value; i <= end_value; i++) {
    to_addr_node = g.addV('VirtualAddress').property('address', to_addr + String.valueOf(i)).property('token_addr', token_addr).next()
    to_id = to_addr_node.id()
    Date ts = new Date((timestamp.toLong() - i) * 1000)
    g.addE('TRANSFER_TO').from(g.V(from_id)).to(g.V(to_id))
            .property('amount', amount + i)
            .property('tx_hash', tx_hash)
            .property('tx_index', tx_index + i)
            .property('created_time', ts)
            .next()


    if (i % 20000 == 0) {
        println("[total:${i}]")
        System.sleep(500)
        g.tx().commit()
        graph.close()
        System.sleep(5000)


        graph = JanusGraphFactory.open(graph_conf)
        g = graph.traversal()
        System.sleep(5000)
    }
    g.tx().commit()
}

graph.close()
 

Here are my query criteria:

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').order().by(‘amount’,desc).limit(500).valueMap().toList()


HadoopMarc <bi...@...>
 

You already specified the vertex-centrex index on the amount key to be ordered while creating the index. By explicitly reordering the results in the traversal, the index cannot take effect because the reordering needs alls vertices to be retrieved instead of just the first 500.

HTH,    Marc

Op zaterdag 25 juli 2020 om 20:52:37 UTC+2 schreef 18...@...:

Hi JanusGraph team,

I have created a vertex-centric indexes for vertices. As follows, now I want to use the index to get the information of the top 500 edges in descending sort. However, I find that the execution time is the same as that without vertex index. How can I use the index to sort faster and extract the information of the first 500 edges more quickly?

Here's the graph I've built:

graph=JanusGraphFactory.open(‘janusgraph-cql-es-server-test2.properties’) mgmt = graph.openManagement() mgmt.makeVertexLabel('VirtualAddress').make() addr = mgmt.makePropertyKey('address').dataType(String.class).cardinality(SINGLE).make() token_addr = mgmt.makePropertyKey('token_addr').dataType(String.class).cardinality(SINGLE).make() transfer_to=mgmt.makeEdgeLabel('TRANSFER_TO').multiplicity(MULTI).make() amount = mgmt.makePropertyKey('amount').dataType(Double.class).cardinality(SINGLE).make() tx_hash = mgmt.makePropertyKey('tx_hash').dataType(String.class).cardinality(SINGLE).make() tx_index = mgmt.makePropertyKey('tx_index').dataType(Integer.class).cardinality(SINGLE).make() created_time = mgmt.makePropertyKey('created_time').dataType(Date.class).cardinality(SINGLE).make() updated_time = mgmt.makePropertyKey('updated_time').dataType(Date.class).cardinality(SINGLE).make() mgmt.buildIndex('addressComposite', Vertex.class).addKey(addr).buildCompositeIndex() mgmt.buildIndex('addressTokenUniqComposite', Vertex.class).addKey(addr).addKey(token_addr).unique().buildCompositeIndex() mgmt.buildEdgeIndex(transfer_to,"transferOutAmountTs", Direction.OUT, Order.desc,amount,created_time) mgmt.buildEdgeIndex(transfer_to,"transferOutTs", Direction.OUT, Order.desc,created_time) mgmt.commit()

Here's the data I inserted, building a starting point, a million edges associated with it, and 100 endpoints,

graph_conf = 'janusgraph-cql-es-server-test2.properties'
graph = JanusGraphFactory.open(graph_conf)
g = graph.traversal()
String line = "5244613,tx_hash_00,token_addr_00,from_addr_00,to_addr_00,,,6000,19305.57174337591,72,1520896044"
int start_value = 1
int end_value = 1000000

line = "tx_hash_00,token_addr_00,from_addr_00,to_addr_00,6000,72,1520896044"
cloumns = line.split(',', -1)
(tx_hash, token_addr, from_addr, to_addr, amount, log_index, timestamp) = cloumns
from_addr_node = g.addV('VirtualAddress').property('address', from_addr).property('token_addr', token_addr).next()
from_id = from_addr_node.id()
amount = amount.toBigDecimal()
tx_index = log_index.toInteger()
for (int i = start_value; i <= end_value; i++) {
    to_addr_node = g.addV('VirtualAddress').property('address', to_addr + String.valueOf(i)).property('token_addr', token_addr).next()
    to_id = to_addr_node.id()
    Date ts = new Date((timestamp.toLong() - i) * 1000)
    g.addE('TRANSFER_TO').from(g.V(from_id)).to(g.V(to_id))
            .property('amount', amount + i)
            .property('tx_hash', tx_hash)
            .property('tx_index', tx_index + i)
            .property('created_time', ts)
            .next()


    if (i % 20000 == 0) {
        println("[total:${i}]")
        System.sleep(500)
        g.tx().commit()
        graph.close()
        System.sleep(5000)


        graph = JanusGraphFactory.open(graph_conf)
        g = graph.traversal()
        System.sleep(5000)
    }
    g.tx().commit()
}

graph.close()
 

Here are my query criteria:

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').order().by(‘amount’,desc).limit(500).valueMap().toList()


Leah <18834...@...>
 


How can I use the index to get the top 500 edges of the amount descending sort faster?

在2020年7月26日星期日 UTC+8 下午3:19:56<HadoopMarc> 写道:

You already specified the vertex-centrex index on the amount key to be ordered while creating the index. By explicitly reordering the results in the traversal, the index cannot take effect because the reordering needs alls vertices to be retrieved instead of just the first 500.

HTH,    Marc

Op zaterdag 25 juli 2020 om 20:52:37 UTC+2 schreef 18...@...:

Hi JanusGraph team,

I have created a vertex-centric indexes for vertices. As follows, now I want to use the index to get the information of the top 500 edges in descending sort. However, I find that the execution time is the same as that without vertex index. How can I use the index to sort faster and extract the information of the first 500 edges more quickly?

Here's the graph I've built:

graph=JanusGraphFactory.open(‘janusgraph-cql-es-server-test2.properties’) mgmt = graph.openManagement() mgmt.makeVertexLabel('VirtualAddress').make() addr = mgmt.makePropertyKey('address').dataType(String.class).cardinality(SINGLE).make() token_addr = mgmt.makePropertyKey('token_addr').dataType(String.class).cardinality(SINGLE).make() transfer_to=mgmt.makeEdgeLabel('TRANSFER_TO').multiplicity(MULTI).make() amount = mgmt.makePropertyKey('amount').dataType(Double.class).cardinality(SINGLE).make() tx_hash = mgmt.makePropertyKey('tx_hash').dataType(String.class).cardinality(SINGLE).make() tx_index = mgmt.makePropertyKey('tx_index').dataType(Integer.class).cardinality(SINGLE).make() created_time = mgmt.makePropertyKey('created_time').dataType(Date.class).cardinality(SINGLE).make() updated_time = mgmt.makePropertyKey('updated_time').dataType(Date.class).cardinality(SINGLE).make() mgmt.buildIndex('addressComposite', Vertex.class).addKey(addr).buildCompositeIndex() mgmt.buildIndex('addressTokenUniqComposite', Vertex.class).addKey(addr).addKey(token_addr).unique().buildCompositeIndex() mgmt.buildEdgeIndex(transfer_to,"transferOutAmountTs", Direction.OUT, Order.desc,amount,created_time) mgmt.buildEdgeIndex(transfer_to,"transferOutTs", Direction.OUT, Order.desc,created_time) mgmt.commit()

Here's the data I inserted, building a starting point, a million edges associated with it, and 100 endpoints,

graph_conf = 'janusgraph-cql-es-server-test2.properties'
graph = JanusGraphFactory.open(graph_conf)
g = graph.traversal()
String line = "5244613,tx_hash_00,token_addr_00,from_addr_00,to_addr_00,,,6000,19305.57174337591,72,1520896044"
int start_value = 1
int end_value = 1000000

line = "tx_hash_00,token_addr_00,from_addr_00,to_addr_00,6000,72,1520896044"
cloumns = line.split(',', -1)
(tx_hash, token_addr, from_addr, to_addr, amount, log_index, timestamp) = cloumns
from_addr_node = g.addV('VirtualAddress').property('address', from_addr).property('token_addr', token_addr).next()
from_id = from_addr_node.id()
amount = amount.toBigDecimal()
tx_index = log_index.toInteger()
for (int i = start_value; i <= end_value; i++) {
    to_addr_node = g.addV('VirtualAddress').property('address', to_addr + String.valueOf(i)).property('token_addr', token_addr).next()
    to_id = to_addr_node.id()
    Date ts = new Date((timestamp.toLong() - i) * 1000)
    g.addE('TRANSFER_TO').from(g.V(from_id)).to(g.V(to_id))
            .property('amount', amount + i)
            .property('tx_hash', tx_hash)
            .property('tx_index', tx_index + i)
            .property('created_time', ts)
            .next()


    if (i % 20000 == 0) {
        println("[total:${i}]")
        System.sleep(500)
        g.tx().commit()
        graph.close()
        System.sleep(5000)


        graph = JanusGraphFactory.open(graph_conf)
        g = graph.traversal()
        System.sleep(5000)
    }
    g.tx().commit()
}

graph.close()
 

Here are my query criteria:

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').order().by(‘amount’,desc).limit(500).valueMap().toList()


HadoopMarc <bi...@...>
 

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').has('amount', gte(6000)).limit(500).valueMap().toList()

I am not sure the has() step is even necessary, or maybe just has('amount') is sufficient to trigger the - already sorted - index.

Best wishes,

Marc

Op zondag 26 juli 2020 om 11:15:16 UTC+2 schreef 18...@...:


How can I use the index to get the top 500 edges of the amount descending sort faster?

在2020年7月26日星期日 UTC+8 下午3:19:56<HadoopMarc> 写道:
You already specified the vertex-centrex index on the amount key to be ordered while creating the index. By explicitly reordering the results in the traversal, the index cannot take effect because the reordering needs alls vertices to be retrieved instead of just the first 500.

HTH,    Marc

Op zaterdag 25 juli 2020 om 20:52:37 UTC+2 schreef 18...@...:

Hi JanusGraph team,

I have created a vertex-centric indexes for vertices. As follows, now I want to use the index to get the information of the top 500 edges in descending sort. However, I find that the execution time is the same as that without vertex index. How can I use the index to sort faster and extract the information of the first 500 edges more quickly?

Here's the graph I've built:

graph=JanusGraphFactory.open(‘janusgraph-cql-es-server-test2.properties’) mgmt = graph.openManagement() mgmt.makeVertexLabel('VirtualAddress').make() addr = mgmt.makePropertyKey('address').dataType(String.class).cardinality(SINGLE).make() token_addr = mgmt.makePropertyKey('token_addr').dataType(String.class).cardinality(SINGLE).make() transfer_to=mgmt.makeEdgeLabel('TRANSFER_TO').multiplicity(MULTI).make() amount = mgmt.makePropertyKey('amount').dataType(Double.class).cardinality(SINGLE).make() tx_hash = mgmt.makePropertyKey('tx_hash').dataType(String.class).cardinality(SINGLE).make() tx_index = mgmt.makePropertyKey('tx_index').dataType(Integer.class).cardinality(SINGLE).make() created_time = mgmt.makePropertyKey('created_time').dataType(Date.class).cardinality(SINGLE).make() updated_time = mgmt.makePropertyKey('updated_time').dataType(Date.class).cardinality(SINGLE).make() mgmt.buildIndex('addressComposite', Vertex.class).addKey(addr).buildCompositeIndex() mgmt.buildIndex('addressTokenUniqComposite', Vertex.class).addKey(addr).addKey(token_addr).unique().buildCompositeIndex() mgmt.buildEdgeIndex(transfer_to,"transferOutAmountTs", Direction.OUT, Order.desc,amount,created_time) mgmt.buildEdgeIndex(transfer_to,"transferOutTs", Direction.OUT, Order.desc,created_time) mgmt.commit()

Here's the data I inserted, building a starting point, a million edges associated with it, and 100 endpoints,

graph_conf = 'janusgraph-cql-es-server-test2.properties'
graph = JanusGraphFactory.open(graph_conf)
g = graph.traversal()
String line = "5244613,tx_hash_00,token_addr_00,from_addr_00,to_addr_00,,,6000,19305.57174337591,72,1520896044"
int start_value = 1
int end_value = 1000000

line = "tx_hash_00,token_addr_00,from_addr_00,to_addr_00,6000,72,1520896044"
cloumns = line.split(',', -1)
(tx_hash, token_addr, from_addr, to_addr, amount, log_index, timestamp) = cloumns
from_addr_node = g.addV('VirtualAddress').property('address', from_addr).property('token_addr', token_addr).next()
from_id = from_addr_node.id()
amount = amount.toBigDecimal()
tx_index = log_index.toInteger()
for (int i = start_value; i <= end_value; i++) {
    to_addr_node = g.addV('VirtualAddress').property('address', to_addr + String.valueOf(i)).property('token_addr', token_addr).next()
    to_id = to_addr_node.id()
    Date ts = new Date((timestamp.toLong() - i) * 1000)
    g.addE('TRANSFER_TO').from(g.V(from_id)).to(g.V(to_id))
            .property('amount', amount + i)
            .property('tx_hash', tx_hash)
            .property('tx_index', tx_index + i)
            .property('created_time', ts)
            .next()


    if (i % 20000 == 0) {
        println("[total:${i}]")
        System.sleep(500)
        g.tx().commit()
        graph.close()
        System.sleep(5000)


        graph = JanusGraphFactory.open(graph_conf)
        g = graph.traversal()
        System.sleep(5000)
    }
    g.tx().commit()
}

graph.close()
 

Here are my query criteria:

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').order().by(‘amount’,desc).limit(500).valueMap().toList()


Leah <18834...@...>
 

Hi Marc,
After testing, has() step() condition is necessary, this solution is very effective.Thank you for all your assistance.

Warm regards,
Leah
在2020年7月27日星期一 UTC+8 上午3:26:38<HadoopMarc> 写道:

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').has('amount', gte(6000)).limit(500).valueMap().toList()

I am not sure the has() step is even necessary, or maybe just has('amount') is sufficient to trigger the - already sorted - index.

Best wishes,

Marc

Op zondag 26 juli 2020 om 11:15:16 UTC+2 schreef 18...@...:

How can I use the index to get the top 500 edges of the amount descending sort faster?

在2020年7月26日星期日 UTC+8 下午3:19:56<HadoopMarc> 写道:
You already specified the vertex-centrex index on the amount key to be ordered while creating the index. By explicitly reordering the results in the traversal, the index cannot take effect because the reordering needs alls vertices to be retrieved instead of just the first 500.

HTH,    Marc

Op zaterdag 25 juli 2020 om 20:52:37 UTC+2 schreef 18...@...:

Hi JanusGraph team,

I have created a vertex-centric indexes for vertices. As follows, now I want to use the index to get the information of the top 500 edges in descending sort. However, I find that the execution time is the same as that without vertex index. How can I use the index to sort faster and extract the information of the first 500 edges more quickly?

Here's the graph I've built:

graph=JanusGraphFactory.open(‘janusgraph-cql-es-server-test2.properties’) mgmt = graph.openManagement() mgmt.makeVertexLabel('VirtualAddress').make() addr = mgmt.makePropertyKey('address').dataType(String.class).cardinality(SINGLE).make() token_addr = mgmt.makePropertyKey('token_addr').dataType(String.class).cardinality(SINGLE).make() transfer_to=mgmt.makeEdgeLabel('TRANSFER_TO').multiplicity(MULTI).make() amount = mgmt.makePropertyKey('amount').dataType(Double.class).cardinality(SINGLE).make() tx_hash = mgmt.makePropertyKey('tx_hash').dataType(String.class).cardinality(SINGLE).make() tx_index = mgmt.makePropertyKey('tx_index').dataType(Integer.class).cardinality(SINGLE).make() created_time = mgmt.makePropertyKey('created_time').dataType(Date.class).cardinality(SINGLE).make() updated_time = mgmt.makePropertyKey('updated_time').dataType(Date.class).cardinality(SINGLE).make() mgmt.buildIndex('addressComposite', Vertex.class).addKey(addr).buildCompositeIndex() mgmt.buildIndex('addressTokenUniqComposite', Vertex.class).addKey(addr).addKey(token_addr).unique().buildCompositeIndex() mgmt.buildEdgeIndex(transfer_to,"transferOutAmountTs", Direction.OUT, Order.desc,amount,created_time) mgmt.buildEdgeIndex(transfer_to,"transferOutTs", Direction.OUT, Order.desc,created_time) mgmt.commit()

Here's the data I inserted, building a starting point, a million edges associated with it, and 100 endpoints,

graph_conf = 'janusgraph-cql-es-server-test2.properties'
graph = JanusGraphFactory.open(graph_conf)
g = graph.traversal()
String line = "5244613,tx_hash_00,token_addr_00,from_addr_00,to_addr_00,,,6000,19305.57174337591,72,1520896044"
int start_value = 1
int end_value = 1000000

line = "tx_hash_00,token_addr_00,from_addr_00,to_addr_00,6000,72,1520896044"
cloumns = line.split(',', -1)
(tx_hash, token_addr, from_addr, to_addr, amount, log_index, timestamp) = cloumns
from_addr_node = g.addV('VirtualAddress').property('address', from_addr).property('token_addr', token_addr).next()
from_id = from_addr_node.id()
amount = amount.toBigDecimal()
tx_index = log_index.toInteger()
for (int i = start_value; i <= end_value; i++) {
    to_addr_node = g.addV('VirtualAddress').property('address', to_addr + String.valueOf(i)).property('token_addr', token_addr).next()
    to_id = to_addr_node.id()
    Date ts = new Date((timestamp.toLong() - i) * 1000)
    g.addE('TRANSFER_TO').from(g.V(from_id)).to(g.V(to_id))
            .property('amount', amount + i)
            .property('tx_hash', tx_hash)
            .property('tx_index', tx_index + i)
            .property('created_time', ts)
            .next()


    if (i % 20000 == 0) {
        println("[total:${i}]")
        System.sleep(500)
        g.tx().commit()
        graph.close()
        System.sleep(5000)


        graph = JanusGraphFactory.open(graph_conf)
        g = graph.traversal()
        System.sleep(5000)
    }
    g.tx().commit()
}

graph.close()
 

Here are my query criteria:

g.V().has('address', ‘from_addr_00').outE('TRANSFER_TO').order().by(‘amount’,desc).limit(500).valueMap().toList()