I'm trying to optimize a force-directed graph. So far I have implemented it using the naïve O(n2) method. It can only handle around 1000 nodes, which is far too few for my needs.
Those of you familiar with the algorithm know that it has two main components: repulsion between nodes in Coulomb's law fashion and spring-like attraction along edges using Hooke's law, both of these components involve pairwise calculations between nodes.
The Barnes-Hut algorithm works nicely for the former which would bring the repulsion component to O(n log n). However I have not been able to find something similar for the spring component. I have considered the following:
Dividing the nodes based on location into overlapping bins and performing pairwise calculations only between nodes in the same bin. However, this might not work in all cases, especially since the initial configuration of the nodes is random and connected nodes could be anywhere. I could change how I generate the nodes but unless they are all in the same bin it would still produce incorrect results.
Storing the edges separately and iterating through them to calculate the spring forces. Right now this looks to be the most promising method to me.
Is there a better way I haven't considered? If it matters at all, I'm using C# and it'd be nice if it were trivial to throw in a parallel loop.
I feel the second option that you gave should work with linear complexity in terms of the number of edges. Just iterate through them and keep updating the resultant forces on the respective 2 nodes.
EDIT: Sorry I earlier thought that each node is connected to every other node through a spring.
If I understood you correctly you have O(n log n) for the repulsion component and the attraction component is sparse: for each node you have on average k << n spring-like attractions. If so you can take care of the attraction component in O(n * k) by storing attraction in adjacency list instead of adjacency matrix.
In the end, I implemented what I described in my second case and it worked well. I maintained and iterated through a collection of neighbours for each node, which allowed the entire acceleration routine to be easily parallelized (along with Barnes-Hut). Overall time complexity is O(max(n log n, k)) where k is the total number of edges. My C# implementation handles around 100000 nodes and 150000 edges at an acceptable level of performance.
Related
I am producing some simple network drawing software in C# (I shall write 'network' rather than 'graph' for clarity). Networks may eventually have multiple directed edges between some vertices/nodes. I have designed the data structure for this software as follows:
Each node and edge has a corresponding 'interactible' object determining the visualisation and processing inputs such as clicks targeted at these visible objects. These objects may store any amount of data about the respective objects that I choose, and for now they contain the endpoints of the edges, for example, as well as an integer identifier for each object.
Nodes (and edges between them) are divided into connected components. I want to implement routines which merge these components upon addition of edges and detect disconnectedness when edges or nodes are deleted.
The overall data for the network which shall for now simply record the number of edges between each pair of nodes, ignoring direction is to be represented in an instance of an INetwork interface, which contains such routines as those for adding and removing nodes and edges, identifying the neighbours of a vertex, and so on.
My question, then, is how to actually implement this IMatrix interface. The implementation can be made to vary with the sparseness of the graph, but I would like to know in terms of memory, processing speed etc what would be most efficient.
More precisely, I am aware that I could just produce an adjacency matrix for each component, or similarly an adjacency list, but which types are best suited to these roles in C#, bearing in mind that my nodes have integer identifiers which I would expect are best left constant throughout?
Would Dictionary<int,Dictionary<int,int>> ever be a good way of implementing an adjacency count so as to be able to remove entries efficiently? Would a large 2-dimensional array work given the node indexing? If I instead store a matrix as a 1-dim'l array, say as an int[] with some methods for treating this as a 2- or 3-dimensional array, would this be better in any meaningful way?
Of course, there are pros and cons to every implementation, but I expect the most taxing routine will always be the one that checks whether the removal of an edge has resulted in the network becoming disconnected. Thus I want whatever implementation I use to be able to quickly check:
Whether a removed edge was the last edge between its pair of vertices.
If so, to be able to quickly identify the neighbours of each node successively so as to identify which nodes lie in which of the resulting components.
Thanks in advance for any and all suggestions. In case anyone wonders, the choice of C# was because of my reliance on the Unity game-making engine for its rendering and (possibly, later on) physics capabilities.
My lack of in depth understanding of the fundamentals has taken a toll on these types of problem solving challenges.
The HackerRank matrix rotation problem is a very fun one to solve. I recommend people who are trying to enrich their coding skills to use hackerrank (https://www.hackerrank.com/challenges/matrix-rotation-algo)
The problem summary is that you are given an R x C matrix of integers where the minimum of R and C must be even. You have to rotate the matrix anti-clockwise x number of times. Rotation applies to the elements of the matrix, not the matrix dimension in case it is not clear.
So I solved this problem with two algorithms. They are both very similar in that you can imagine the matrix like layers of onions where you loop through each layer, and rotate the elements in that layer. The number of rotations is simply x % (count of elements in that layer) so if you are given x=1,000,000 it doesn't make sense to repeat full rotations.
The first one, which is the fastest is:
https://codetidy.com/8002/
The second one, does not loop through the number of rotations but instead does some heavy logic and math to figure out where to move an element to.
https://codetidy.com/8001/
So when I was writing the second one, I assumed that it would be crazy faster, because you don't iterate through maximum number of rotations in each layer. However, it ended up faring slower.
I don't quite understand why. I logged the number of iterations in a console and the first one does 50x more iterations, but is faster.
Number of iterations is not everything. Here are a few general things that might affect the performance.
One important thing to keep in mind with arrays and matrices are cache hits. If your operations generate lots of cache hits they will seem orders of magnitude faster. To get cache hits you usually need to go in the memory order. For an array that is sequentially forward. For a matrix it means incrementing the lowest index first. To get misses you need to jump around in increments larger than the size of the cacheline (CPU dependent). Fun experiment: benchmark for (i...) for (j...) ++m[i][j] and for (i...) for (j..) ++m[j][i] to see the difference.
In your case I would guess that the faster approach has very linear access on the horizontal parts at least.
Then there's branch prediction. Modern CPUs pipeline the instructions to make better use of the existing hardware. Branches (IFs) break the pipeline since you don't know which path to take (that instruction is still executing). As an optimization the compiler/CPU pick one and start processing and if the condition result is the other way it will throw everything away and restart processing. Checking something that usually gives the same result (like i<n) will be faster than something that's harder to predict.
These are some lowlevel reasons why the simpler approach might seem faster. Add some higher level reasons (like compiler not optimizing the code the way you expect) and you get results like this.
An important note: The complexity reflects the asymptotically behavior. Yes, the second approach will be faster for a sufficiently large matrix, and it's very likely that the sizes used for this problem are not sufficiently large.
I am trying to implement a pathfinding algorithm, but I think I'm running into terminology issues, in that I'm not quite sure how to explain what I need the algorithm to do.
I have a regular grid of nodes, and I am trying to find all nodes within a certain "Manhattan Distance".
Finding the nodes within, say, 5, is simple enough.
But I am interested in a "Weighted Manhattan Distance", where certain squares "cost" twice as much (or more) to enter. For instance, if orange squares cost 2 to enter, and purple squares cost 10, the graph I'm interested in looks like this:
Firstly, is there a term for this? It's hard to look up info on things when you're not entirely sure what they're called in the first place.
Secondly, how can I calculate which nodes fall within my parameters? I'm not looking for a full solution, necessarily, just some hints to get started; when I realized my implementation would require three Dictionarys, I began to think there might be an easier way of handling things.
For terminology, you're basically asking for all points within a certain distance on an arbitrary (positive) weighted graph. The use of differing weights means it no longer corresponds to a specific metric such as Manhattan distance.
As for algorithms, Dijkstra's algorithm is probably what you want. The basic idea is to maintain the minimum cost to each square that you've found so far, and a priority queue of the best squares to explore next.
Unlike traditional Dijkstra's where you keep going until you find the minimal path to every square, you'll want to stop adding nodes to the queue if the distance to them is too long. Once you're done, you'll have a list of all squares whose shortest path from the starting square is at most x, which sounds like what you want.
Eric Lippert provides an excellent blog-series on writing an A-* path finding algorithm in C# here:
Part 1:http://blogs.msdn.com/b/ericlippert/archive/2007/10/02/path-finding-using-a-in-c-3-0.aspx
Part 2: http://blogs.msdn.com/b/ericlippert/archive/2007/10/04/path-finding-using-a-in-c-3-0-part-two.aspx
Part 3: http://blogs.msdn.com/b/ericlippert/archive/2007/10/08/path-finding-using-a-in-c-3-0-part-three.aspx
Part 4: http://blogs.msdn.com/b/ericlippert/archive/2007/10/10/path-finding-using-a-in-c-3-0-part-four.aspx
You are probably best to go with Dijkstra's algorithm with weighted graph, like described here:
http://www.csl.mtu.edu/cs2321/www/newLectures/29_Weighted_Graphs_and_Dijkstra's_Algorithm.html
(There is algorithm description near the middle of the page.)
Manhattan distance in your case probably just means you don't want the diagonal paths in the graph.
From the presentation: Graphs and Trees on page 3, there is a visual presentation of what takes place during the Reigngold-Tilford process; it also gives a vague summary to this algorithm before hand: "...starts with bottom-up pass of the tree;
[finishes with] Top-down pass for assignment of final positions..." I can achieve both directional passes through recursive means, and I know that the Y-value(s) are respective to each node's generation level, but I'm still lost as to how the X-coordinates are solved.
I did come across this project: A Graph Tree Drawing Control for WPF but there is so much code I had great difficulty locating what should be a simple 2-3 methods to define the X-values. (Also have no experience with WPF as well)
I have been searching and experimenting how to do this for several days now, so your help is much appreciated!
I found the articles listed in jwpat7's answer to be very useful, although it took me a while to figure out the exact logic needed for this algorithm, so I wrote my own blog post to simplify the explanation.
Here's the plain-text logic of how you determine the X node positions:
Start with a post-order traversal of the tree
Assign an initial X value to each node of 0 if it's the first in a set, or previousSibling + 1 if it's not.
If a node has children, find the desired X value that would center it over its children.
If the node is the left-most node, set its X to that value
If the node is not the left-most node, set a Mod property on the node to (X - centeredX) in order to shift all children so they're centered under this node. The last traversal of the tree will use this Mod property to determine the final X value of each node.
Determine if any of this node's children would overlap any children of siblings to the left of this node. Basically for each Y, get the largest and smallest X from the two nodes, and compare them.
If any collisions occur, shift the node over by however much needed. Shifting a subtree just requires adding to the X and Mod properties of the node.
If node was shifted, also shift any nodes between the two subtrees that were overlapping so they are equally spaced out
Do a check to ensure that when the final X is calculated, there is no negative X values. If any are found, add the largest one to the Root Node's X and Mod properites to shift the entire tree over
Do a second traversal of the tree using pre-order traversal and add the sum of the Mod values from each of a node's parents to it's X property
The final X values of the tree above would look like this:
I have some more details and some sample code in my blog post, but it's too long to include everything here, and I wanted to focus on the logic of the algorithm instead of code specifics.
A couple of articles are available that include code, in python at billmill.org and in C on page 2 of a 1 February 1991 Dr. Dobb's Journal article. You have asked for “simple 2-3 methods” (perhaps meaning cookbook methods) but drawing trees nicely in all their generality is an NP-complete problem (see Supowit, K.J. and E.M. Reingold, "The complexity of drawing trees nicely," Acta Informatica 18, 4, January 1983, 377-392, ref. 4 in the DDJ article). The Reingold–Tilford method draws binary trees more or less nicely in linear time, and Buchheim's variation draws n-ary trees more or less nicely in linear time. However, the billmill article points out (shortly after stating Principle 6), “Every time so far that we've looked at a simple algorithm in this article, we've found it inadequate...” so the likelihood of simpler methods working ok is small.
Was wondering if anyone has knowledge on implementing pathfinding, but using scent. The stronger the scent in the nodes surrounding, is the way the 'enemy' moves towards.
Thanks
Yes, I did my university final project on the subject.
One of the applications of this idea is for finding the shortest path.
The idea is that the 'scent', as you put it, will decay over time. But the shortest path between two points will have the strongest scent.
Have a look at this paper.
What did you want to know exactly??
Not quite clear what the question is in particular - but this just seems like another way of describing the Ant colony optimization problem:
In computer science and operations
research, the ant colony optimization
algorithm (ACO) is a probabilistic
technique for solving computational
problems which can be reduced to
finding good paths through graphs.
Well, think about it for a minute.
My idea would to divide the game field into sections of 32x32 (or whatever size your character is). Then run some checks every x seconds (so if they stay still the tiles around them will have more 'scent') to figure out how strong a scent is on any given tile. Some examples might be: 1) If you cross over the tile, add 3; 2) if you crossed over an adjacent tile, add 1.
Then add things like degradation over time, reduce every tile by 1 every x seconds until it hits zero.
The last thing you will need to worry about is using AI to track this path. I would recommend just putting the AI somewhere, and telling it to find a node with a scent, then goto an adjacent node with a higher/equal value scent. Also worry about crossing off paths taken. If the player goes up a path, then back down it another direction, make sure the AI does always just take the looped back path.
The last thing to look at with the AI would be to add a bit of error. Make the AI take the wrong path every once in a while. Or lose the trail a little more easily.
Those are the key points, I'm sure you can come up with some more, with some more brainstorming.
Every game update (or some other, less frequent time frame), increase the scent value of nodes near to where the target objects (red blobs) are.
Decrease all node scent values by some fall-off amount to zero.
In the yellow blob's think/move function get available nodes to move to. Move towards the node with the highest scent value.
Depending on the number of nodes the 'decrease all node scent values' could do with optomisation, e.g. maybe maintaining a a list of non-zero nodes to be decreased.
I see a big contradiction between scent model and pathfinding. For a hunter in the nature finding the path by scent means finding exactly the path used by the followed subject. And in games pathfinding means finding the fastest path between two points. It is not the same.
1. While modelling the scent you will count the scent concentration in the point as the SUM of the surrounding concentrations multiplied by different factors. And searching for the fastest path from the point means taking the MINIMUM of the times counted for surrounding points, multiplied by the different parametres.
2. Counting the scent you should use recursive model - scent goes in all directions, including backward. In the case of the pathfinding, if you have found the shortest paths for points surrounding the target, they won't change.
3 Level of scent can rise and fall. In pathfinding, while searching for minimum, the result can never rise.
So, the scent model is really much more complicated than your target. Of course, what I have said, is true only for the standard situation and you can have something very special...