I have some photos of white pages with some black points drawn on, like this:
photo (the points aren't very circular, I can draw them better),
and I would find the coordinates of these points.
I can binarize the images (the previous photo binarized: image), but how can I find the coordinates of these black points? I need only the coordinates of one pixel for each point, the approximate center.
This is for a school assignment.
Since its for school work I will only provide you with a high level algorithm.
Since the background is guarantee to be white, you are in luck.
First you need to define a threshold on the level black which you want to consider as the black dot's color.
#ffffff is pure white and #000000 is pure black. I would suggest some where like #383838 to be your threshold.
Then you make a two dimensional bool array to keep track of which pixel you have visited already.
Now we can start looking at the picture.
You read the pixel one at the time horizontally and see if the pixel is > threshold. If yes then you do a DFS or BFS to find the entire area where the pixel's neighbor is also > threshold.
During the process you will be marking the bool array we created earlier to indicate that you have already visited the pixel.
since its a circle point you can just take the min, max of x and y coordinate and calculate the center point.
Once you are done with one point you would keep looping thru the picture's pixel and find the points that you have not visited (false in the bool array)
Since the points you have on the photo contains some small dots on the edge which is not connected to the large point, you might have to do some math to see if the radius is > some number to consider that a valid point. Or instead of a radius 1 neighbor you do a 5 - 10 pixel neighbor BFS/DFS to include the ones that are really close to the main point.
The basics for processing image data can be found in other questions, so I won't go into deeper detail about that, but for the threshold check specifically, I'd do it by gathering the red, green and blue bytes of each pixel (as indicated in the answer I linked), and then just combine them to a Color c = Color.FromArgb(r,g,b) and testing that to be "dark" using c.GetBrightness() < brightnessThreshold. A value of 0.4 was a good threshold for your test image.
You should store the result of this threshold detection in an array in which each item is a value that indicates whether the threshold check passed or failed. This means you can use something as simple as a two-dimensional Boolean array with the original image's height and width.
If you already have methods of doing all that, all the better. Just make sure you got some kind of array in which you can easily look up the result of that binarization. If the method you have gives you the result as image, you will be more likely to end up with a simple one-dimensional byte array, but then your lookups will simply be of a format like imagedata[y * stride + x]. This is functionally identical to how internal lookups in a two-dimensional array happen, so it won't be any less efficient.
Now, the real stuff in here, as I said in my comment, would be an algorithm to detect which pixels should be grouped together to one "blob".
The general usage of this algorithm is to loop over every single pixel on the image, then check if A) it cleared the threshold, and B) it isn't already in one of your existing detected blobs. If the pixel qualifies, generate a new list of all threshold-passed pixels connected to this one, and add that new list to your list of detected blobs. I used the Point class to collect coordinates, making each of my blobs a List<Point>, and my collection of blobs a List<List<Point>>.
As for the algorithm itself, what you do is make two collections of points. One is the full collection of neighbouring points you're building up (the points list), the other is the current edge you're scanning (the current edge list). The current edge list will start out containing your origin point, and the following steps will loop as long as there are items in your current edge list:
Add all items from the current edge list into the full points list.
Make a new collection for your next edge (the next edge list).
For each point in your current edge list, get a list of its directly neighbouring points (excluding any that would fall outside the image bounds), and check for all of these points if they clear the threshold, and if they are not already in either the points list or the next edge list. Add the points that pass the checks to the next edge list.
After this loop through the current edge list ends, replace the original current edge list by the next edge list.
...and, as I said, loop these steps as long as your current edge list after this last step is not empty.
This will create an edge that expands until it matches all threshold-clearing pixels, and will add them all to the list. Eventually, as all neighbouring pixels end up in the main list, the new generated edge list will become empty, and the algorithm will end. Then you add your new points list to the list of blobs, and any pixels you loop over after that can be detected as already being in those blobs, so the algorithm is not repeated for them.
There are two ways of doing the neighbouring points; you either get the four points around it, or all eight. The difference is that using four will not make the algorithm do diagonal jumps, while using eight will. (An added effect is that one causes the algorithm to expand in a diamond shape, while the other expands in a square.) Since you seem to have some stray pixels around your blobs, I advise you to get all eight.
As Steve pointed out in his answer, a very quick way of doing checks to see if a point is present in a collection is to create a two-dimensional Boolean array with the dimensions of the image, e.g. Boolean[,] inBlob = new Boolean[height, width];, which you keep synchronized with the actual points list. So whenever you add a point, you also mark the [y, x] position in the Boolean array as true. This will make rather heavy checks of the if (collection.contains(point)) type as simple as if (inBlob[y,x]), which requires no iterations at all.
I had a List<Boolean[,]> inBlobs which I kept synced with the List<List<Point>> blobs I built, and in the expanding-edge algorithm I kept such a Boolean[,] for both the next edge list and the points list (the latter of which was added to inBlobs at the end).
As I commented, once you have your blobs, just loop over the points inside them per blob and get the minimums and maximums for both X and Y, so you end up with the boundaries of the blob. Then just take the averages of those to get the center of the blob.
Extras:
If all your dots are guaranteed to be a significant distance apart, a very easy way to get rid of floating edge pixels is to take the edge boundaries of each blob, expand them all by a certain threshold (I took 2 pixels for that), and then loop over these rectangles and check if any intersect, and merge those that do. The Rectangle class has both an IntersectsWith() for easy checks, and a static Rectangle.Inflate for increasing a rectangle's size.
You can optimise the memory usage of the fill method by only storing the edge points (threshold-matching points with non-matching neighbours in any of the four main directions) in the main list. The final boundaries, and thus the center, will remain the same. The important thing to remember then is that, while you exclude a bunch of points from the blob list, you should mark all of them in the Boolean[,] array that's used for checking the already-processed pixels. This doesn't take up any extra memory anyway.
The full algorithm, including optimisations, in action on your photo, using 0.4 as brightness threshold:
Blue are the detected blobs, red is the detected outline (by using the memory-optimised method), and the single green pixels indicate the center points of all blobs.
[Edit]
Since it's been almost a year since I posted this, I guess I might as well link to the implementation I made of this. I actually managed to use it myself about a month after I wrote it, when recreating the video compression algorithm of an old DOS game which used chunked up diff frames.
Related
I am interested in generating a star system that uses seed-based randomized points in 3d space so that if the rendered point would move, the points in space would remain in their relative position from the global origin, effectively creating the illusion of a pre-seeded universe.
I have looked at previous solutions and the recommended Poisson Spheres which generates random positions incrementally but isn't ideal for the core purpose even though I would benefit from a minimum distance between all points, this can be done by simply snapping the points to a grid and using a random seed to offset them based on world space.
Using something like 3d noise to reduce density is also an option but I suspect doesn't hit the core issue of creating single points if the algorithm simply finds the brightest peak in the 3d space when you want multiple potentially in an area.
Ideally, being able to control the density with 3d noise would be a benefit.
The end result would be a system where feeding a random string populates a larger seed that populates 3d points that are referenceable by an ID based on its render parameters, allowing those items to be found again based on its Vector 3 location from global origin
Research:
Generate random points in 3D space with minimum nearest-neighbor distance
How do I generate random points in 3D space?
Github Poisson Disc Sampling
Found a solution:
Using the Origin of the players reference and the relative offset from the grid, you search nearby "points" on the X, Y, Z.
You store these points as a key name in some sort of index or database, in my case a Vector3Int as a dictionary key.
When iterating through the above loop, you ensure the key does not exist in the dictionary already.
I repeated this process for 'cells' and then again for 'stars' inside the area
I saw a lot a topic about this, I understood the theory but I'm not able to code this.
I have some pictures and I want to determine if they are blurred or not. I found a library (aforge.dll) and I used it to compte a FFT for an image.
As an example, there is two images i'm working on :
My code is in c# :
public Bitmap PerformFFT(Bitmap Picture)
{
//Loade Image
ComplexImage output = ComplexImage.FromBitmap(Picture);
// Perform FFT
output.ForwardFourierTransform();
// return image
return = output.ToBitmap();
}
How can I determine if the image is blurred ? I am not very comfortable with the theory, I need concret example. I saw this post, but I have no idea how to do that.
EDIT:
I'll clarify my question. When I have a 2D array of complex ComplexImage output (image FFT), what is the C# code (or pseudo code) I can use to determine if image is blurred ?
The concept of "blurred" is subjective. How much power at high frequencies indicates it's not blurry? Note that a blurry image of a complex scene has more power at high frequencies than a sharp image of a very simple scene. For example a sharp picture of a completely uniform scene has no high frequencies whatsoever. Thus it is impossible to define a unique blurriness measure.
What is possible is to compare two images of the same scene, and determine which one is more blurry (or identically, which one is sharper). This is what is used in automatic focussing. I don't know how exactly what process commercial cameras use, but in microscopy, images are taken at a series of focal depths, and compared.
One of the classical comparison methods doesn't involve Fourier transforms at all. One computes the local variance (for each pixel, take a small window around it and compute the variance for those values), and averages it across the image. The image with the highest variance has the best focus.
Comparing high vs low frequencies as in MBo's answer would be comparable to computing the Laplace filtered image, and averaging its absolute values (because it can return negative values). The Laplace filter is a high-pass filter, meaning that low frequencies are removed. Since the power in the high frequencies gives a relative measure of sharpness, this statistic does too (again relative, it is to be compared only to images of the same scene, taken under identical circumstances).
Blurred image has FFT result with smaller magnitude in high-frequency regions. Array elements with low indexes (near Result[0][0]) represent low-frequency region.
So divide resulting array by some criteria, sum magnitudes in both regions and compare them. For example, select a quarter of result array (of size M) with index<M/2 and indexy<M/2
For series of more and more blurred image (for the same initial image) you should see higher and higher ratio Sum(Low)/Sum(High)
Result is square array NxN. It has central symmetry (F(x,y)=F(-x,-y) because source is pure real), so it is enough to treat top half of array with y<N/2.
Low-frequency components are located near top-left and top-right corners of array (smallest values of y, smallest and highest values of x). So sum magnitudes of array elements in ranges
for y in range 0..N/2
for x in range 0..N
amp = magnitude(y,x)
if (y<N/4) and ((x<N/4)or (x>=3*N/4))
low = low + amp
else
high = high + amp
Note that your picture shows jumbled array pieces - this is standard practice to show zero component in the center.
I have a fairly simple object with shape defined by 12 vertices. When doing hidden lines calculation on this object(I am using Cad Control to do this) it returns collection of lines making up the shape which is usually much more than minimum count of lines to draw such a shape, please see attached picture:
Each segment between points is a line. I would like to remove points that are marked in red color leaving only minimum count (yellow cross) necessary to draw shape.
One approach would be to sort them clockwise and then loop through them checking if a cross product of three adjacent points in the list is zero and then deleting the middle one. Unfortunately, it is impossible to predict, how points will be sorted, therefore this is not an option.
Second approach would be to loop through the collection of lines offered by cad control and to find all points that are on the same line, sort them (pointsLineA, pointsLineB, pointsLineC, etc) From there it would be much easier.
So far I have accomplished that I loop through line collection (get each lines points) and in nested loop I loop through the same collection(copy of it) to check if the points of any random line in the collection lie on the same line as points from line from first loop. This involves two loops and modifying collections on the run. To make it short, it is a MESS. If you would like to see code sample, please let me know.
To make sure everything is clear - my first objective is to group points so that in each group would appear points only belonging to one line. Any suggestions?
With 12 vertices (or hundreds of vertices) I wouldn't do space partitioning (I'm thinking about adaptive quadtree, 2D-tree (kd-tree with k=2)).
I'd store for each vertex which lines it belongs to (it's easier to assign an ID to each vertex and line instead of comparing each time the coordinates of vertices).
vertex(1)=(2.5,3.97) <- vertex coordinates
vertex(2)=( 13.453 , 24.687 )
lines_for_Vertex(1)= {1,5} if vertex 1 is member of lines 1 and 5
lines_for_Vertex(3)={2,5,7} if vertex 3 is member of lines 2,5 and 7
lines_for_vertex(9)={4} if vertex 9 is member of line 4 (edge or segment not connected)
lines_for_vertex(3)={} if vertex 3 is not connected (not member of any segment)
(maybe some cases are impossible for you)
You can assign ID to lines with position inside your collection of lines.
In any case, if you do this changes or keep your collections of lines, inside the nested loops you have to collect information of point to be deleted without changing anything.
So instead of doing:
if vertices are aligned then remove the vertex in the middle
you fill a list 'to_remove' with this information:
to_remove.add(vertex in the middle) <- with the ID is easier
Then when the two loops end, you can remove all the vertices collected in the list. If you have the array 'lines_for_vertex' it's easy to find the two segments to be collapsed into one (eg. if vertex to remove is 1, the collapsing lines are 1 and 5).
If you build a structure even for lines, referring to ID of its vertices
e.g. line(5)={1,3} if line with ID=5 connects vertices 1 and 3
(compare with lines_for_vertex above), it's easier to know how to collapse lines.
You need to retrieve the topology of the polygon. The means to rearrange the vertices in a closed loop. By comparing the endpoint coordinates, you can find those that match and obtain a graph with edges between endpoints, and endpoints merged.
From this representation you can easily detect and remove the alignments.
I am using the SURF algorithm in C# (OpenSurf) to get a list of interest points from an image. Each of these interest points contains a vector of descriptors , an x coordinate (int), an y coordinate (int), the scale (float) and the orientation (float).
Now, i want to compare the interest points from one image to a list of images in a database which also have a list of interest points, to find the most similar image. That is: [Image(I.P.)] COMPARETO [List of Images(I.P.)]. => Best match. Comparing the images on an individual basis yields unsatisfactory results.
When searching stackoverflow or other sites, the best solution i have found is to build an FLANN index while at the same time keeping track of where the interest points comes from. But before implementation, I have some questions which puzzle me:
1) When matching images based on their SURF interest points an algorithm I have found does the matching by comparing their distance (x1,y1->x2,y2) with each other and finding the image with the lowest total distance. Are the descriptors or orientation never used when comparing interest points?
2) If the descriptors are used, than how do i compare them? I can't figure out how to compare X vectors of 64 points (1 image) with Y vectors of 64 points (several images) using a indexed tree.
I would really appreciate some help. All the places I have searched or API I found, only support matching one picture to another, but not to match one picture effectively to a list of pictures.
There are multiple things here.
In order to know two images are (almost) equal, you have to find the homographic projection of the two such that the projection results in a minimal error between the projected feature locations. Brute-forcing that is possible but not efficient, so a trick is to assume that similar images tend to have the feature locations in the same spot as well (give or take a bit). For example, when stitching images, the image to stitch are usually taken only from a slightly different angle and/or location; even if not, the distances will likely grow ("proportionally") to the difference in orientation.
This means that you can - as a broad phase - select candidate images by finding k pairs of points with minimum spatial distance (the k nearest neighbors) between all pairs of images and perform homography only on these points. Only then you compare the projected point-pairwise spatial distance and sort the images by said distance; the lowest distance implies the best possible match (given the circumstances).
If I'm not mistaken, the descriptors are oriented by the strongest angle in the angle histogram. Theat means you may also decide to take the euclidean (L2) distance of the 64- or 128-dimensional feature descriptors directly to obtain the actual feature-space similarity of two given features and perform homography on the best k candidates. (You will not compare the scale in which the descriptors were found though, because that would defeat the purpose of scale invariance.)
Both options are time consuming and direcly depend on the number of images and features; in other word's: stupid idea.
Approximate Nearest Neighbors
A neat trick is to not use actual distances at all, but approximate distances instead. In other words, you want an approximate nearest neighbor algorithm, and FLANN (although not for .NET) would be one of them.
One key point here is the projection search algorithm. It works like this:
Assuming you want to compare the descriptors in 64-dimensional feature space. You generate a random 64-dimensional vector and normalize it, resulting in an arbitrary unit vector in feature space; let's call it A. Now (during indexing) you form the dot product of each descriptor against this vector. This projects each 64-d vector onto A, resulting in a single, real number a_n. (This value a_n represents the distance of the descriptor along A in relation to A's origin.)
This image I borrowed from this answer on CrossValidated regarding PCA demonstrates it visually; think about the rotation as the result of different random choices of A, where the red dots correspond to the projections (and thus, scalars a_n). The red lines show the error you make by using that approach, this is what makes the search approximate.
You will need A again for search, so you store it. You also keep track of each projected value a_n and the descriptor it came from; furthermore you align each a_n (with a link to its descriptor) in a list, sorted by a_n.
To clarify using another image from here, we're interested in the location of the projected points along the axis A:
The values a_0 .. a_3 of the 4 projected points in the image are approximately sqrt(0.5²+2²)=1.58, sqrt(0.4²+1.1²)=1.17, -0.84 and -0.95, corresponding to their distance to A's origin.
If you now want to find similar images, you do the same: Project each descriptor onto A, resulting in a scalar q (query). Now you go to the position of q in the list and take the k surrounding entries. These are your approximate nearest neighbors. Now take the feature-space distance of these k values and sort by lowest distance - the top ones are your best candidates.
Coming back to the last picture, assume the topmost point is our query. It's projection is 1.58 and it's approximate nearest neighbor (of the four projected points) is the one at 1.17. They're not really close in feature space, but given that we just compared two 64-dimensional vectors using only two values, it's not that bad either.
You see the limits there and, similar projections do not at all require the original values to be close, this will of course result in rather creative matches. To accomodate for this, you simply generate more base vectors B, C, etc. - say n of them - and keep track of a separate list for each. Take the k best matches on all of them, sort that list of k*n 64-dimensional vectors according to their euclidean distance to the query vector, perform homography on the best ones and select the one with the lowest projection error.
The neat part about this is that if you have n (random, normalized) projection axes and want to search in 64-dimensional space, you are simply multiplying each descriptor with a n x 64 matrix, resulting in n scalars.
I am pretty sure that the distance is calculated between the descriptors and not their coordinates (x,y). You can compare directly only one descriptor against another. I propose the following possible solution (surely not the optimal)
You can find for each descriptor in the query image the top-k nearest neighbors in your dataset, and later take all top-k lists and finds the most common image there.
Given an elevation map consisting of lat/lon/elevation pairs, what is the fastest way to find all points above a given elevation level (or better yet, just the the 2D concave hull)?
I'm working on a GIS app where I need to render an overlay on top of a map to visually indicate regions that are of higher elevation; it's determining this polygon/region that has me stumped (for now). I have a simple array of lat/lon/elevation pairs (more specifically, the GTOPO30 DEM files), but I'm free to transform that into any data structure that you would suggest.
We've been pointed toward Triangulated Irregular Networks (TINs), but I'm not sure how to efficiently query that data once we've generated the TIN. I wouldn't be surprised if our problem could be solved similarly to how one would generate a contour map, but I don't have any experience with it. Any suggestions would be awesome.
It sounds like you're attempting to create a polygonal representation of the boundary of the high land.
If you're working with raster data (sampled on a rectangular grid), try this.
Think of your grid as an assembly of right triangles.
Let's say you have a 3x3 grid of points
a b c
d e f
g h k
Your triangles are:
abd part of the rectangle abed
bde the other part of the rectangle abed
bef part of the rectangle bcfe
cef the other part of the rectangle bcfe
dge ... and so on
Your algorithm has these steps.
Build a list of triangles that are above the elevation threshold.
Take the union of these triangles to make a polygonal area.
Determine the boundary of the polygon.
If necessary, smooth the polygon boundary to make your layer look ok when displayed.
If you're trying to generate good looking contour lines, step 4 is very hard to to right.
Step 1 is the key to this problem.
For each triangle, if all three vertices are above the threshold, include the whole triangle in your list. If all are below, forget about the triangle. If some vertices are above and others below, split your triangle into three by adding new vertices that lie precisely on the elevation line (by interpolating elevation). Include the one or two of those new triangles in your highland list.
For the rest of the steps you'll need a decent 2d geometry processing library.
If your points are not on a regular grid, start by using the Delaunay algorithm (which you can look up) to organize your pointss in into triangles. Then follow the same algorith I mentioned above. Warning. This is going to look kind of sketchy if you don't have many points.
Assuming you have the lat/lon/elevation data stored in an array (or three separate arrays) you should be able to use array querying techniques to select all of the points where the elevation is above a certain threshold. For example, in python with numpy you can do:
indices = where(array > value)
And the indices variable will contain the indices of all elements of array greater than the threshold value. Similar commands are available in various other languages (for example IDL has the WHERE() command, and similar things can be done in Matlab).
Once you've got this list of indices you could create a new binary array where each place where the threshold was satisfied is set to 1:
binary_array[indices] = 1
(Assuming you've created a blank array of the same size as your original lat/long/elevation and called it binary_array.
If you're working with raster data (which I would recommend for this type of work), you may find that you can simply overlay this array on a map and get a nice set of regions appearing. However, if you need to convert the areas above the elevation threshold to vector polygons then you could use one of many inbuilt GIS methods to convert raster->vector.
I would use a nested C-squares arrangement, with each square having a pre-calculated maximum ground height. This would allow me to scan at a high level, discarding any squares where the max height is not above the search height, and drilling further into those squares where parts of the ground were above the search height.
If you're working to various set levels of search height, you could precalculate the convex hull for the various predefined levels for the smallest squares that you decide to use (or all the squares, for that matter.)
I'm not sure whether your lat/lon/alt points are on a regular grid or not, but if not, perhaps they could be interpolated to represent even 100' ft altitude increments, and uniform
lat/lon divisions (bearing in mind that that does not give uniform distance divisions). But if that would work, why not precompute a three dimensional array, where the indices represent altitude, latitude, and longitude respectively. Then when the aircraft needs data about points at or above an altitude, for a specific piece of terrain, the code only needs to read out a small part of the data in this array, which is indexed to make contiguous "voxels" contiguous in the indexing scheme.
Of course, the increments in longitude would not have to be uniform: if uniform distances are required, the same scheme would work, but the indexes for longitude would point to a nonuniformly spaced set of longitudes.
I don't think there would be any faster way of searching this data.
It's not clear from your question if the set of points is static and you need to find what points are above a given elevation many times, or if you only need to do the query once.
The easiest solution is to just store the points in an array, sorted by elevation. Finding all points in a certain elevation range is just binary search, and you only need to sort once.
If you only need to do the query once, just do a linear search through the array in the order you got it. Building a fancier data structure from the array is going to be O(n) anyway, so you won't get better results by complicating things.
If you have some other requirements, like say you need to efficiently list all points inside some rectangle the user is viewing, or that points can be added or deleted at runtime, then a different data structure might be better. Presumably some sort of tree or grid.
If all you care about is rendering, you can perform this very efficiently using graphics hardware, and there is no need to use a fancy data structure at all, you can just send triangles to the GPU and have it kill fragments above or below a certain elevation.