I was unsure how to phrase the title. I am searching for good data structure / algorithm combination for the following problem:
I have about 20000 objects, each containing a set of values (integers) and a XYZ-Position (doubles). I want to find the object that fits the following conditions best:
1. Needs to be within a maximum distance from a given starting point
2. From a given starting point, it has to be reachable by requiring no "hopping" (i.e. "travelling" from one object with position XYZ to another object's position) with a distance greater than a given threshold
3. It has to have the maximum value (of a given position in the set of integers)
At first I thought about graph theory and pathfinding, but it does not seem to fit well. I have no distinct edges, so I would have to link every point with every other point and use the distance as a weight on the edge. This would result in a lot(!) of edges. Second problem is that pathfinding (if I am not mistaken) only takes one criteria (usually distance or costs) as search criteria. But I would need multiple criteria (distance, hop limit, int value).
Any thoughts and also any good libraries to solve this well?
Related
This might be a weird application.
The brief description of the problem is "How to get Absolute Coordination of nodes based on Relative Positions (distances) ?"
We have a number of Nodes (each with a unique ID) and a list specifying its Adjacent nodes and distance to each of them as Input.
The required output would be one possible way to lay out these nodes on a 2D Surface.
The resulting algorithm is going to be used in C#... So external .net libraries might help too.
It would be a great help if you could advise me an approach to do that.
Thank you in advance.
You must have coordinates of at least three known points at start.
Way I. If the known points are adjacent, the process is simple - you loop all your points, looking for such, which have in their lists three known points. Use two of them to count two possible positions, then use the third to choose right or left variant. Repeat the loops until you have no new points during a loop.
That simple algorithm has bad convergence - the errors are accumulating and far points could have bad coordinates. But as you have the coordinates integer, you can repair coords after each counting and have them good.
Way II. If the known points are not adjacent to each other, the process is more complicated.
Let's say, you have start known points A,B,C.
Take A and some its adjacent point D. Place it somewhere at the correct distance from A.
Find some point E adjacent to A and D. Choose any of two possible positions.
Starting from A, D, E, use the way I.
When you reach by distances the second start known point, let it be B, of course, it will be in bad place. Turn all the net you have built around A so, that B will get the correct coordinates. Continue the looping.
When you will reach the last of the start known points, C, it will be set correct or not. If not, mirror the whole net relatively AB axis - the C will be set correctly. (If not, you have bad data). Continue the way I looping till the end.
Both these two ways work if you have long lists for all points. If points have only few distances given, the task becomes much, much more complicated.
Given a Delaunay Triangulation of a point set, how should I index my triangulation to do quick point localization?
I'm currently looping over all the triangles. For each triangle, I'm checking if the given point is within triangle's bounding rectangle. If it is, I then check the triangle using geometry equations.
This is slow. Any ideas of how to make this search more efficient?
Mission accomplished, that's the way I ended up doing it:
1) Check if the point lies within triangle bounding rectangle.
2) Assign the point as the start of a horizontal line, ending at max width.
3) Check intersections from the triangles found in (1) with the line from (2).
4) If triangle intersect, check how many times the horizontal line intersect with the triangle.
5) If intersects 1 time, means point in triangle. Else, not in triangle.
Reference:
Fast generation of points inside triangulated objects obtained by cross-sectional contours
Ranging from quick and practical to theoretically robust, here are three approaches you could use:
Construct a regular grid where each cell contains a list of triangles that intersect it. Given a query point, in constant time determine the cell that contains it, then compare your query point against only those triangles that are in that cell's list.
Construct a quadtree where each leaf cell contains the triangles that intersect it. Localizing the query point to a quadtree leaf takes logtime, but this can be more efficient in both speed and memory overall.
Sweep a horizontal line down across all the triangles. Points in your point sets correspond to events. At each event, some triangles begin intersecting the sweepline, and other triangles stop intersecting the sweepline. You can use an immutable (aka persistent) sorted map data structure to efficiently represent this. map<double, sweepstate>, where the key is the y-intercept of the sweepline at an event and sweepstate is a sorted list of line segment pairs (corresponding to the left and right sides of triangles). Given a query point, you first use its y value to lookup a sweepstate, and then you do a single trapezoid containment test. (Two horizontal sweeplines and two line segments between them form a trapezoid.)
A common approach to solve this point location problem is the efficient Trapezoidal Decomposition. It reduces the query time to O(Log(N)) per point, after O(N.Log(N)) preprocessing time, using O(N) space.
It could also be that the distribution of your query points allows alternative/simpler approaches.
A solution is a hierarchical tree, I.e. dendogram or hierarchical cluster. For example use the euklidian distance:http://en.m.wikipedia.org/wiki/Hierarchical_clustering. Or you can use a metric tree.
I tried to find an algorithm for the following problem, but I couldn't.
you have a matrix, 10X6 if it matters. (10 on x dimension 6 on y dimension).
the algorithm receives 2 point, the opening point and the target point.
the array is full of 0's and 1's and it should find the shortest path of 1's between them, and return the first point in this path (the next point in the way to the target).
But here's the catch:
each point can get the value only of the following points:
the point above it.
the point underneath it.
the point left to it.
the point right to it.
and to make things even harder: for every point, the value of other point may be different. for example:
the opening point is 0,0. the value of 0,1 is 1;
the opening point is 0,2. the value of 0,1 is 0.
I can calculate the value so it shouldn't matter for you...
So I thought the only way to solve it is with recursion because of the last condition but if you find another way, you're welcome.
The solution should be in LUA, C# or JAVA.
You can simply interpret your matrix as a graph. Every cell (i,j) corresponds to a node v(i,j) and two nodes are connected if an only if their corresponding cells are neighbors and both are set to 1.
The example matrix below has the four vertices v(0,0), v(0,1), v(1,0), and v(1,1), with edges {v(0,0),v(0,1)} and {v(0,1),v(1,1)} (the vertex v(1,0) is isolated).
1 1
0 1
As your graph is unweighted, you can simply use a breadth-first search (BFS) to find a shortest path. For pseudocode see: http://en.wikipedia.org/wiki/Breadth-first_search#Pseudocode
Your restriction that every entry in a matrix only knows its neighboring entries does not matter. When talking about graphs, this means that ever vertex knows its neighbors, which is exactly what you need in the BFS. Using a different graph when searching from different starting points does not make the problem harder either.
Just two comments to the poseudocode linked above:
It only checks whether there is a connection or not. If you actually want to have the shortest path, you need to change the following. When a new vertex u is added to the queue when seen from its neighbor t, you have to store a link at u pointing to t. When you finally found your target, following back the links gives you the shortest path.
Using a set to store which elements are already visited is inefficient. In your case, just use a boolean matrix of the same size as your input matrix to mark vertices visited.
I am using the SURF algorithm in C# (OpenSurf) to get a list of interest points from an image. Each of these interest points contains a vector of descriptors , an x coordinate (int), an y coordinate (int), the scale (float) and the orientation (float).
Now, i want to compare the interest points from one image to a list of images in a database which also have a list of interest points, to find the most similar image. That is: [Image(I.P.)] COMPARETO [List of Images(I.P.)]. => Best match. Comparing the images on an individual basis yields unsatisfactory results.
When searching stackoverflow or other sites, the best solution i have found is to build an FLANN index while at the same time keeping track of where the interest points comes from. But before implementation, I have some questions which puzzle me:
1) When matching images based on their SURF interest points an algorithm I have found does the matching by comparing their distance (x1,y1->x2,y2) with each other and finding the image with the lowest total distance. Are the descriptors or orientation never used when comparing interest points?
2) If the descriptors are used, than how do i compare them? I can't figure out how to compare X vectors of 64 points (1 image) with Y vectors of 64 points (several images) using a indexed tree.
I would really appreciate some help. All the places I have searched or API I found, only support matching one picture to another, but not to match one picture effectively to a list of pictures.
There are multiple things here.
In order to know two images are (almost) equal, you have to find the homographic projection of the two such that the projection results in a minimal error between the projected feature locations. Brute-forcing that is possible but not efficient, so a trick is to assume that similar images tend to have the feature locations in the same spot as well (give or take a bit). For example, when stitching images, the image to stitch are usually taken only from a slightly different angle and/or location; even if not, the distances will likely grow ("proportionally") to the difference in orientation.
This means that you can - as a broad phase - select candidate images by finding k pairs of points with minimum spatial distance (the k nearest neighbors) between all pairs of images and perform homography only on these points. Only then you compare the projected point-pairwise spatial distance and sort the images by said distance; the lowest distance implies the best possible match (given the circumstances).
If I'm not mistaken, the descriptors are oriented by the strongest angle in the angle histogram. Theat means you may also decide to take the euclidean (L2) distance of the 64- or 128-dimensional feature descriptors directly to obtain the actual feature-space similarity of two given features and perform homography on the best k candidates. (You will not compare the scale in which the descriptors were found though, because that would defeat the purpose of scale invariance.)
Both options are time consuming and direcly depend on the number of images and features; in other word's: stupid idea.
Approximate Nearest Neighbors
A neat trick is to not use actual distances at all, but approximate distances instead. In other words, you want an approximate nearest neighbor algorithm, and FLANN (although not for .NET) would be one of them.
One key point here is the projection search algorithm. It works like this:
Assuming you want to compare the descriptors in 64-dimensional feature space. You generate a random 64-dimensional vector and normalize it, resulting in an arbitrary unit vector in feature space; let's call it A. Now (during indexing) you form the dot product of each descriptor against this vector. This projects each 64-d vector onto A, resulting in a single, real number a_n. (This value a_n represents the distance of the descriptor along A in relation to A's origin.)
This image I borrowed from this answer on CrossValidated regarding PCA demonstrates it visually; think about the rotation as the result of different random choices of A, where the red dots correspond to the projections (and thus, scalars a_n). The red lines show the error you make by using that approach, this is what makes the search approximate.
You will need A again for search, so you store it. You also keep track of each projected value a_n and the descriptor it came from; furthermore you align each a_n (with a link to its descriptor) in a list, sorted by a_n.
To clarify using another image from here, we're interested in the location of the projected points along the axis A:
The values a_0 .. a_3 of the 4 projected points in the image are approximately sqrt(0.5²+2²)=1.58, sqrt(0.4²+1.1²)=1.17, -0.84 and -0.95, corresponding to their distance to A's origin.
If you now want to find similar images, you do the same: Project each descriptor onto A, resulting in a scalar q (query). Now you go to the position of q in the list and take the k surrounding entries. These are your approximate nearest neighbors. Now take the feature-space distance of these k values and sort by lowest distance - the top ones are your best candidates.
Coming back to the last picture, assume the topmost point is our query. It's projection is 1.58 and it's approximate nearest neighbor (of the four projected points) is the one at 1.17. They're not really close in feature space, but given that we just compared two 64-dimensional vectors using only two values, it's not that bad either.
You see the limits there and, similar projections do not at all require the original values to be close, this will of course result in rather creative matches. To accomodate for this, you simply generate more base vectors B, C, etc. - say n of them - and keep track of a separate list for each. Take the k best matches on all of them, sort that list of k*n 64-dimensional vectors according to their euclidean distance to the query vector, perform homography on the best ones and select the one with the lowest projection error.
The neat part about this is that if you have n (random, normalized) projection axes and want to search in 64-dimensional space, you are simply multiplying each descriptor with a n x 64 matrix, resulting in n scalars.
I am pretty sure that the distance is calculated between the descriptors and not their coordinates (x,y). You can compare directly only one descriptor against another. I propose the following possible solution (surely not the optimal)
You can find for each descriptor in the query image the top-k nearest neighbors in your dataset, and later take all top-k lists and finds the most common image there.
Given an elevation map consisting of lat/lon/elevation pairs, what is the fastest way to find all points above a given elevation level (or better yet, just the the 2D concave hull)?
I'm working on a GIS app where I need to render an overlay on top of a map to visually indicate regions that are of higher elevation; it's determining this polygon/region that has me stumped (for now). I have a simple array of lat/lon/elevation pairs (more specifically, the GTOPO30 DEM files), but I'm free to transform that into any data structure that you would suggest.
We've been pointed toward Triangulated Irregular Networks (TINs), but I'm not sure how to efficiently query that data once we've generated the TIN. I wouldn't be surprised if our problem could be solved similarly to how one would generate a contour map, but I don't have any experience with it. Any suggestions would be awesome.
It sounds like you're attempting to create a polygonal representation of the boundary of the high land.
If you're working with raster data (sampled on a rectangular grid), try this.
Think of your grid as an assembly of right triangles.
Let's say you have a 3x3 grid of points
a b c
d e f
g h k
Your triangles are:
abd part of the rectangle abed
bde the other part of the rectangle abed
bef part of the rectangle bcfe
cef the other part of the rectangle bcfe
dge ... and so on
Your algorithm has these steps.
Build a list of triangles that are above the elevation threshold.
Take the union of these triangles to make a polygonal area.
Determine the boundary of the polygon.
If necessary, smooth the polygon boundary to make your layer look ok when displayed.
If you're trying to generate good looking contour lines, step 4 is very hard to to right.
Step 1 is the key to this problem.
For each triangle, if all three vertices are above the threshold, include the whole triangle in your list. If all are below, forget about the triangle. If some vertices are above and others below, split your triangle into three by adding new vertices that lie precisely on the elevation line (by interpolating elevation). Include the one or two of those new triangles in your highland list.
For the rest of the steps you'll need a decent 2d geometry processing library.
If your points are not on a regular grid, start by using the Delaunay algorithm (which you can look up) to organize your pointss in into triangles. Then follow the same algorith I mentioned above. Warning. This is going to look kind of sketchy if you don't have many points.
Assuming you have the lat/lon/elevation data stored in an array (or three separate arrays) you should be able to use array querying techniques to select all of the points where the elevation is above a certain threshold. For example, in python with numpy you can do:
indices = where(array > value)
And the indices variable will contain the indices of all elements of array greater than the threshold value. Similar commands are available in various other languages (for example IDL has the WHERE() command, and similar things can be done in Matlab).
Once you've got this list of indices you could create a new binary array where each place where the threshold was satisfied is set to 1:
binary_array[indices] = 1
(Assuming you've created a blank array of the same size as your original lat/long/elevation and called it binary_array.
If you're working with raster data (which I would recommend for this type of work), you may find that you can simply overlay this array on a map and get a nice set of regions appearing. However, if you need to convert the areas above the elevation threshold to vector polygons then you could use one of many inbuilt GIS methods to convert raster->vector.
I would use a nested C-squares arrangement, with each square having a pre-calculated maximum ground height. This would allow me to scan at a high level, discarding any squares where the max height is not above the search height, and drilling further into those squares where parts of the ground were above the search height.
If you're working to various set levels of search height, you could precalculate the convex hull for the various predefined levels for the smallest squares that you decide to use (or all the squares, for that matter.)
I'm not sure whether your lat/lon/alt points are on a regular grid or not, but if not, perhaps they could be interpolated to represent even 100' ft altitude increments, and uniform
lat/lon divisions (bearing in mind that that does not give uniform distance divisions). But if that would work, why not precompute a three dimensional array, where the indices represent altitude, latitude, and longitude respectively. Then when the aircraft needs data about points at or above an altitude, for a specific piece of terrain, the code only needs to read out a small part of the data in this array, which is indexed to make contiguous "voxels" contiguous in the indexing scheme.
Of course, the increments in longitude would not have to be uniform: if uniform distances are required, the same scheme would work, but the indexes for longitude would point to a nonuniformly spaced set of longitudes.
I don't think there would be any faster way of searching this data.
It's not clear from your question if the set of points is static and you need to find what points are above a given elevation many times, or if you only need to do the query once.
The easiest solution is to just store the points in an array, sorted by elevation. Finding all points in a certain elevation range is just binary search, and you only need to sort once.
If you only need to do the query once, just do a linear search through the array in the order you got it. Building a fancier data structure from the array is going to be O(n) anyway, so you won't get better results by complicating things.
If you have some other requirements, like say you need to efficiently list all points inside some rectangle the user is viewing, or that points can be added or deleted at runtime, then a different data structure might be better. Presumably some sort of tree or grid.
If all you care about is rendering, you can perform this very efficiently using graphics hardware, and there is no need to use a fancy data structure at all, you can just send triangles to the GPU and have it kill fragments above or below a certain elevation.