I am producing some simple network drawing software in C# (I shall write 'network' rather than 'graph' for clarity). Networks may eventually have multiple directed edges between some vertices/nodes. I have designed the data structure for this software as follows:
Each node and edge has a corresponding 'interactible' object determining the visualisation and processing inputs such as clicks targeted at these visible objects. These objects may store any amount of data about the respective objects that I choose, and for now they contain the endpoints of the edges, for example, as well as an integer identifier for each object.
Nodes (and edges between them) are divided into connected components. I want to implement routines which merge these components upon addition of edges and detect disconnectedness when edges or nodes are deleted.
The overall data for the network which shall for now simply record the number of edges between each pair of nodes, ignoring direction is to be represented in an instance of an INetwork interface, which contains such routines as those for adding and removing nodes and edges, identifying the neighbours of a vertex, and so on.
My question, then, is how to actually implement this IMatrix interface. The implementation can be made to vary with the sparseness of the graph, but I would like to know in terms of memory, processing speed etc what would be most efficient.
More precisely, I am aware that I could just produce an adjacency matrix for each component, or similarly an adjacency list, but which types are best suited to these roles in C#, bearing in mind that my nodes have integer identifiers which I would expect are best left constant throughout?
Would Dictionary<int,Dictionary<int,int>> ever be a good way of implementing an adjacency count so as to be able to remove entries efficiently? Would a large 2-dimensional array work given the node indexing? If I instead store a matrix as a 1-dim'l array, say as an int[] with some methods for treating this as a 2- or 3-dimensional array, would this be better in any meaningful way?
Of course, there are pros and cons to every implementation, but I expect the most taxing routine will always be the one that checks whether the removal of an edge has resulted in the network becoming disconnected. Thus I want whatever implementation I use to be able to quickly check:
Whether a removed edge was the last edge between its pair of vertices.
If so, to be able to quickly identify the neighbours of each node successively so as to identify which nodes lie in which of the resulting components.
Thanks in advance for any and all suggestions. In case anyone wonders, the choice of C# was because of my reliance on the Unity game-making engine for its rendering and (possibly, later on) physics capabilities.
Related
I have a collection of data points contained in List<Point4D> allPoints where each Point4D point is represented by a node containing its x,y,z location in space (point.X , point.Y , point.Z) and its magnitude value ( point.W ). The data points represent individual points of stress on an object, and therefore there are various clusters of data points on the object in which the data points are in close proximity and have similar magnitudes.
I want to be able to identify where these clusters are and which data points they include. The user needs to be able to see the clusters and will (eventually) be able to filter them based on size/number of points/stress value magnitude, etc (this is not my main concern right now).
For now, I'd just like to be able to generate a sort of "bubble" around the data points included in each cluster, so that I can display each cluster individually.
I have tried implementing K-means but got stuck as I needed to know how many clusters there were beforehand (at least, this was a requirement in all the implementations I've found). For my purposes, I will not know how many clusters there are or where they are beforehand; this information varies depending on the current data set being analyzed (the data is imported from a .csv file uploaded by the user).
Any ideas would be greatly appreciated!
Thr usual way is to run k-means several times for different k, and pick the "best" by some heuristic such as the (stupid) elbow method. Better choices include VRC, but it should be very clear that there is no universally best kz and your application may be an example where you will likely want a larger k than the "best" found by such methods.
Also there are variants such as x-means and g-means that try to"learn k" during clustering, largely by trying to split clusters as long as some heuristic improves.
I'm trying to optimize a force-directed graph. So far I have implemented it using the naïve O(n2) method. It can only handle around 1000 nodes, which is far too few for my needs.
Those of you familiar with the algorithm know that it has two main components: repulsion between nodes in Coulomb's law fashion and spring-like attraction along edges using Hooke's law, both of these components involve pairwise calculations between nodes.
The Barnes-Hut algorithm works nicely for the former which would bring the repulsion component to O(n log n). However I have not been able to find something similar for the spring component. I have considered the following:
Dividing the nodes based on location into overlapping bins and performing pairwise calculations only between nodes in the same bin. However, this might not work in all cases, especially since the initial configuration of the nodes is random and connected nodes could be anywhere. I could change how I generate the nodes but unless they are all in the same bin it would still produce incorrect results.
Storing the edges separately and iterating through them to calculate the spring forces. Right now this looks to be the most promising method to me.
Is there a better way I haven't considered? If it matters at all, I'm using C# and it'd be nice if it were trivial to throw in a parallel loop.
I feel the second option that you gave should work with linear complexity in terms of the number of edges. Just iterate through them and keep updating the resultant forces on the respective 2 nodes.
EDIT: Sorry I earlier thought that each node is connected to every other node through a spring.
If I understood you correctly you have O(n log n) for the repulsion component and the attraction component is sparse: for each node you have on average k << n spring-like attractions. If so you can take care of the attraction component in O(n * k) by storing attraction in adjacency list instead of adjacency matrix.
In the end, I implemented what I described in my second case and it worked well. I maintained and iterated through a collection of neighbours for each node, which allowed the entire acceleration routine to be easily parallelized (along with Barnes-Hut). Overall time complexity is O(max(n log n, k)) where k is the total number of edges. My C# implementation handles around 100000 nodes and 150000 edges at an acceptable level of performance.
I have large 2D object collection, only lines for now.
I need algorithm suggestion how to create fastest spatial index over
this collection so that I can collect all objects that are inside some
bounds.
Once built index will not be updated.
Object distribution in this database is not spatially uniform.
Algorithm implementation in C#.
Update: Current usage is for road graph of some country, so lines are small, from one crossroad to another, bigger density in populated areas. I think this gives good picture about data.
Obviously there are many indexing methods to achieve this, but I would require one that is fastest.
You can use the Segment Tree if you want to save 2-D lines and your queries are 2-D range queries.
The algorithmic complexity of a query is O( log^2 N ).
Check out quadtrees.... and DotSpatial for spatial type handling, including a quadtree implementation.
You can also try an R-tree. There's a C# implementation available at http://sourceforge.net/projects/cspatialindexrt/.
R-trees should have the kind of performance of a Segment Tree and the above implementation should be stand alone and fairly independent of a lot of extra code references, but I haven't tested it.
There is no silver bullet on this. It depends on the type of data (i.e., only points, only lines, triangles, meshes, any combination of them, etc.) and the type of query (point inside polygon, line intersection, nearest neighbors, any geometry inside a circle or box, etc).
You have a datastructure designed for specific type of query and data. If you want to use a single datastructure for all types of queries and all type of data you have to trade off either space or time or both. You can approach to be reasonably fast but you won't be optimal in general.
In my experience, a datastructure general enough to cope with most geometrical objects and can handle several types of queries I would recommend the AABB-Tree:
https://doc.cgal.org/latest/AABB_tree/index.html
Can anyone suggest a fast, efficient method for storing and accessing a sparse octree?
Preferably something that can be easily implemented in HLSL. (I'm working a raycasting/voxel app)
In this instance, the tree can be precalculated, so I'm mostly concerned with size and search time.
Update
For anyone looking to do this, a more efficient solution may be to store the nodes as a linear octree generated with a Z-order curve/Morton tree. Doing so eliminates storage of inner nodes, but may require cross-referencing the linear tree array with a second "data texture," containing information about the individual voxel.
I'm not very experienced at HLSL, so I'm not sure this will meet your needs, here are my thoughts. Let me know if something here is not sane for your needs - I'd like to discuss so maybe I can learn something myself.
Every node in the octree can exist as a vector3, where the (x,y,z) component represents the center point of the node. The w component can be used as a flags field.
a. The w-flags field can denote which octant child nodes follow the current node. This would require 8 bits of the value.
Each entity stored in your octree can be stored as a bounding box, where r,g,b can be the bounding box dimensions, and w can be used for whatever.
Define a special vector denoting that an object list follows. For example, if the (w + z) is some magic value. Some func(x, y) can, say, be the number of objects that follow. Or, whatever works.
a. Each node is potentially followed by this special vector, indicating that there are objects stored in the node. The next X vectors are all just object identifiers or something like that.
b. Alternatively, you could have one node that just specifies an in-memory object list. Again, not sure what you need here or the constraints on how to access objects.
So, first, build the octree and stuff it with your objects. Then, just walk the octree, outputting the vectors to a memory buffer.
I'm thinking that a 512x512 texture can hold a fully packed octree 5 levels deep (32,768 nodes), each containing 8 objects. Or, a fully packed 4-level octree with 64 objects each.
There is a great article about sparse octrees focusing on GPUs: Efficient Sparse Voxel Octrees – Analysis, Extensions, and Implementation
i want to use sift/surf for template matching. Image can have 1...n targets.
Using surf/sift only one target can be extracted. One idea can be segment image in many segments and then look for sift/surf matching. It works but obviously it is not ideal because of speed and effort. Does there exist any alternative approach?. / Anyone has source code for scale and rotation invariant template matching.
regards,
If i understand correctly what you are saying (provide more informations please), you have N planar image objects. You want to extract SIFT/SURF features from the N images and put all the features in some sort of container (an array or an acceleration data structure for high-dimensional nearest neighbors). When you process a given image, you extract SIFT (or SURF) features and search, for every feature, its closest feature in the container. You end up having a list of pairs (feature from current image, feature from container). Now you have to apply some robust model estimator (RANSAC for example) to construct the homography. If a good homography can be found (with at least 10, 12 inliers), you will be sure that your target is there. Obviously, given the array of features pairs, you subdivide it into groups, where each group is one of the N planar image objects of your database (this is a not the best way to do, probably you should associate to each feature extracted from the current image to k features of the database and using some form of voting-scheme to establish which are the pairs, but doing so things gets more complicated).
So, generally speaking, you have to make some decisions:
feature to use (SIFT? SURF? others?)
robust model estimator (RANSAC? PROSAC? MLSAC?)
which geometric considerations to use when computing the homography (take advantage of
the fact that the homography relates points in two planar objects)
which multi-dimensional data structure you will use to accelerate the search
how to compute the homography (well, probably there is only one way: normalized DLT)
If your obects are NOT planar, the problem is more difficult, since a 3D rigid objects probably changes as the viewpoint changes. To describe it, you will need K images instead of only one. This is a lot more challenging to do, because as N and K grows, recognition rates drops down. Probably, there are other better ways. I strongly suggest to check using google relevant literature.