Date
1 - 5 of 5
Is it possible to choose on which server the varieties have to be placed?
Alexandr Porunov <alexand...@...>
Hello, I need to choose where vertices are stored on the application side. For example: I have 3 servers: server1, server2, server3 I need to store vertices: user1, user2, user3, user4, user5 I store them randomly: server1: user1, user2 server2: user3, user4 server3: user5 Then I add edges: user1 and user3 are friends user1 and user5 are friends user3 and user5 are friends user2 and user4 are friends After some time (maybe once a week) I want to read the graph and optimize graph traversal by minimizing edge cuts. I want to move user vertices to be stored like this: server1: user2, user4 server2: user1, user3, user5 server5: empty Is it possible to achieve with JanusGraph? Will the traversal work if I place vertices by my own? Best regards, Alexandr |
|
Misha Brukman <mbru...@...>
What are "server1", "server2" and "server3" in your example? JanusGraph does not store data itself, it uses a storage backend, such as BerkeleyDB (embedded, in-process) or a distributed storage backend such as HBase, Cassandra, Bigtable, etc. As such, JanusGraph does not manage the storage or location of data, it defers that to its storage backend, and I don't believe JanusGraph has any influence over where the data is stored as it is behind an abstraction layer from JanusGraph's perspective. If your concern is about performance, the general recommendation is to benchmark a storage engine or a few with a representative workload and see if the performance matches your requirements. If not, you may need to either tune that storage backend appropriately or choose a different storage backend. On Tue, Oct 31, 2017 at 8:00 AM, Alexandr Porunov <alexand...@...> wrote:
|
|
David Pitera <piter...@...>
I am not sure _why_ you need different "servers" in your example, however it seems that what you might want are different _graphs_. You can open different graphs using the new ConfiguredGraphFactory http://docs.janusgraph.org/latest/configuredgraphfactory.html. As Misha said, you use backends to store your data. A graph just points to a backend, which is basically defined by a location/port and directory/keyspace/table. Therefore, if you want three separate representations to hold data, you might want to create a Template Configuration pointing to a specific Cassandra location for example, and then use the ConfiguredGraphFactory#create(graphName) to create 3 separate graphs, and this will make each graph store the data in its own keyspace in Cassandra. The documentation will even lead you through other examples using the ConfigurationManagementGraph and ConfiguredGraphFactory APIs. If you do not want to use the ConfiguredGraphFactory APIs, but storing data in separate representations is your goal, then you can use the JanusGraphFactory#open(Configuration) or JanusGraphFactory#open(File location) method to open a new graph by supplying an entire configuration for your graph, where you would configure each to point to a separate backend or table/keyspace/directory. On Tue, Oct 31, 2017 at 7:40 PM, 'Misha Brukman' via JanusGraph users <janusgra...@...> wrote:
|
|
Robert Dale <rob...@...>
I believe Alexandr is referring to Graph Partitioning strategies - http://docs.janusgraph.org/latest/graph-partitioning.html Robert Dale On Tue, Oct 31, 2017 at 8:20 PM, David Pitera <piter...@...> wrote:
|
|
master...@...
Sorry that I didn't mention about graph partitioning. As Robert Dale noticed I am referring exactly to graph partitioning. As we can see from the documentation it is recommended to use 2x partitions from the server's count (or from the size you are going to have in the foreseeable future). It means that each server will have 2 or more partitions. In this case we can indirectly say on which server the vertices are placed. We can change partitioning algorithm. But it will still be predefined. The thing is that in most cases we can't predict where to place vertices to minimize edge cuts in the future. I showed an example where users are not friends from the beginning. After some time they become friends. And now you have a lot of edge cuts. The query like "get user friends" will ask a lot of nodes (in a large graph) to get the list of friends. It is a problem which is solved by Facebook by using Kernighan–Lin algorithm. They are storing vertices randomly and then they are using Apache Giraph to process the graph (incremental) and move vertices from servers to servers to minimize edge cuts. Here is the link: https://code.facebook.com/posts/274771932683700/large-scale-graph-partitioning-with-apache-giraph/ So, my question is: Can we decide where the vertex is placed on the application level? Can we move vertices from server to server (or from partition to partition)? Or the partitioning have to be predefined? Best regards, Alexandr среда, 1 ноября 2017 г., 1:41:42 UTC+2 пользователь Misha Brukman написал:
|
|